Chicago Free Museum Days - Val Town Implementation Spec

Overview

A Bluesky bot that posts weekly updates about free museum days in Chicago, built on Val Town with SQLite storage, automated scraping, and a public API.

Architecture

Val Town Structure

Vals:
├── scrapeChooseChicago    (function - extracts data from Choose Chicago)
├── scraper                (cron - orchestrates all scraper functions)
├── utils                  (function - normalizeDates, date expansion helpers)
├── db                     (function - loadToSQLite, query functions)
├── bot                    (cron - postToBluesky, runs Monday mornings)
└── api                    (HTTP - Hono API for data access)

SQLite Database:
├── scraped_free_days      (auto-populated by scrapers)
├── free_days_overrides    (manual corrections/additions)
└── Union query merges both (overrides win on conflicts)

Data Flow

Choose Chicago (HTML)
    ↓
scrapeChooseChicago() → [{name, rawDates[], url, notes}]
    ↓
normalizeDates() → [{institution, date, time, proof_required, notes}]
    ↓
loadToSQLite() → scraped_free_days table
    ↓
Query (union with overrides)
    ↓
[Branch 1] postToBluesky() → Bluesky post
[Branch 2] API endpoint → JSON response

Database Schema

Table: scraped_free_days

Auto-populated by scrapers. Replaced on each scraper run.

CREATE TABLE IF NOT EXISTS scraped_free_days ( id INTEGER PRIMARY KEY AUTOINCREMENT, institution TEXT NOT NULL, institution_key TEXT NOT NULL, -- Slugified institution name for matching date TEXT NOT NULL, -- ISO format YYYY-MM-DD time TEXT, -- e.g., "9am-5pm", "11am-close" proof_required TEXT, -- e.g., "Illinois ID required", "ZIP code verification" notes TEXT, -- Additional info (holidays, registration required, etc.) source TEXT NOT NULL, -- Source identifier (e.g., "choose-chicago") source_url TEXT, -- Where this data was scraped from scraped_at TEXT NOT NULL, -- ISO timestamp of scrape UNIQUE(institution_key, date) -- Prevent duplicate entries per institution/date ); CREATE INDEX idx_scraped_date ON scraped_free_days(date); CREATE INDEX idx_scraped_institution_key ON scraped_free_days(institution_key); CREATE INDEX idx_scraped_source ON scraped_free_days(source);

Table: free_days_overrides

Manual corrections for when scrapers fail or data needs fixing. Can also suppress incorrect scraped data.

CREATE TABLE IF NOT EXISTS free_days_overrides ( id INTEGER PRIMARY KEY AUTOINCREMENT, institution TEXT NOT NULL, institution_key TEXT NOT NULL, -- Slugified institution name for matching date TEXT NOT NULL, -- ISO format YYYY-MM-DD time TEXT, proof_required TEXT, notes TEXT, is_cancelled INTEGER DEFAULT 0, -- Set to 1 to suppress a scraped entry override_reason TEXT, -- Why this override exists created_at TEXT NOT NULL, -- When override was added UNIQUE(institution_key, date) ); CREATE INDEX idx_override_date ON free_days_overrides(date); CREATE INDEX idx_override_institution_key ON free_days_overrides(institution_key);

Query Logic

When querying for free days, overrides take precedence over scraped data. Use a LEFT JOIN anti-pattern for deterministic results (the previous GROUP BY approach was non-deterministic in SQLite):

-- Get free days for a date range (overrides win, cancelled entries excluded) SELECT s.institution, s.date, s.time, s.proof_required, s.notes FROM scraped_free_days s LEFT JOIN free_days_overrides o ON o.institution_key = s.institution_key AND o.date = s.date WHERE s.date BETWEEN ? AND ? AND o.id IS NULL -- No override exists for this entry UNION ALL SELECT o.institution, o.date, o.time, o.proof_required, o.notes FROM free_days_overrides o WHERE o.date BETWEEN ? AND ? AND COALESCE(o.is_cancelled, 0) = 0 -- Exclude cancelled entries ORDER BY date, institution;

Implementation Steps

Step 1: Database Setup (db val)

Create the db val with:

Functions:

  • initDatabase() - Create tables and indexes
  • loadToSQLite(data, source) - Insert scraped data (deletes old entries from same source first)
  • getFreeDays(startDate, endDate) - Query with override logic
  • addOverride(institution, date, details) - Manual correction helper
  • getUpcomingWeek() - Helper for bot (next 7 days from today)
  • getLatestScrapeTime() - Get timestamp of most recent scrape (for staleness check)
  • slugify(name) - Generate institution_key from institution name
  • chicagoDate(date?) - Get ISO date string in America/Chicago timezone

Implementation:

import { sqlite } from "https://esm.town/v/std/sqlite"; // Get ISO date string in Chicago timezone (avoids UTC date offset issues) export function chicagoDate(d = new Date()): string { const parts = new Intl.DateTimeFormat("en-CA", { timeZone: "America/Chicago", year: "numeric", month: "2-digit", day: "2-digit", }).formatToParts(d); const y = parts.find((p) => p.type === "year")!.value; const m = parts.find((p) => p.type === "month")!.value; const day = parts.find((p) => p.type === "day")!.value; return `${y}-${m}-${day}`; } // Generate a stable key for matching institutions across sources/overrides export function slugify(name: string): string { return name .toLowerCase() .replace(/[^a-z0-9]+/g, "-") .replace(/^-|-$/g, ""); } // Initialize tables export async function initDatabase() { // Create scraped_free_days table // Create free_days_overrides table // Create indexes } // Load scraped data (replaces old scraped data from same source) export async function loadToSQLite(data: Array<FreeDayEntry>, source: string) { // Use sqlite.batch() for atomic transaction await sqlite.batch([ { sql: "DELETE FROM scraped_free_days WHERE source = ?", args: [source] }, ...data.map((entry) => ({ sql: `INSERT INTO scraped_free_days (institution, institution_key, date, time, proof_required, notes, source, source_url, scraped_at) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`, args: [ entry.institution, slugify(entry.institution), entry.date, entry.time, entry.proof_required, entry.notes, source, entry.source_url, new Date().toISOString(), ], })), ]); } // Query with override logic export async function getFreeDays(startDate: string, endDate: string) { // LEFT JOIN anti-pattern query as shown above } // Helper for bot (uses Chicago timezone) export async function getUpcomingWeek() { const today = chicagoDate(); const nextWeekDate = new Date(); nextWeekDate.setDate(nextWeekDate.getDate() + 7); const nextWeek = chicagoDate(nextWeekDate); return getFreeDays(today, nextWeek); } // Check data freshness for bot staleness warning export async function getLatestScrapeTime(): Promise<string | null> { const result = await sqlite.execute( "SELECT MAX(scraped_at) as latest FROM scraped_free_days", ); return result.rows[0]?.latest ?? null; }

Step 2: Scraper (scrapeChooseChicago val)

Dependencies:

  • cheerio (HTML parsing)

Function signature:

export interface RawFreeDayData { institution: string; rawLines: string[]; // All lines from HTML (dates AND eligibility notes mixed) url?: string; } export async function scrapeChooseChicago(): Promise<RawFreeDayData[]>;

Scraping logic:

  1. Fetch HTML from Choose Chicago (with timeout and User-Agent)
  2. Parse structure:
    • Find H3 headings (institution names)
    • Extract following <ul><li> lists (all text, not just dates)
  3. Return structured data

Important: Choose Chicago mixes date info and eligibility notes in the same <li> list. The normalizer will classify each line.

Example HTML structure to parse:

<h3>Art Institute of Chicago</h3> <ul> <li>January 5 - February 28 (weekdays only, 11am-close)</li> <li>Illinois residents with ID</li> </ul>

Pseudocode:

const SOURCE_URL = "https://www.choosechicago.com/articles/museums-art/free-museum-days-in-chicago/"; // Fetch with timeout and polite User-Agent const controller = new AbortController(); setTimeout(() => controller.abort(), 15_000); const response = await fetch(SOURCE_URL, { signal: controller.signal, headers: { "User-Agent": "chicago-free-museum-days-bot/1.0" }, }); const html = await response.text(); const $ = cheerio.load(html); const results = []; $("h3").each((i, el) => { const institution = $(el).text().trim(); const lines = []; // Find following ul/li elements (contains dates AND eligibility notes) $(el) .nextUntil("h3", "ul") .find("li") .each((j, li) => { lines.push($(li).text().trim()); }); results.push({ institution, rawLines: lines, url: SOURCE_URL }); }); return results;

Step 3: Utilities (utils val)

Dependencies:

  • chrono-node (natural language date parsing)

Functions:

  • classifyLine(line) - Determine if a line is date-ish or note-ish
  • normalizeDates(rawData) - Parse raw strings into structured dates
  • expandPattern(pattern, startDate, endDate) - Expand "every Wednesday" into dates
  • parseTimeRange(timeStr) - Normalize time formats

Key logic:

Choose Chicago mixes date info and eligibility notes in the same <li> list. The normalizer uses a two-pass approach:

  1. Classify each line as date-ish or note-ish
  2. Aggregate note-ish lines into proof_required / notes for the institution
  3. Expand date-ish lines into individual FreeDayEntry records
export interface FreeDayEntry { institution: string; date: string; // ISO YYYY-MM-DD time?: string; proof_required?: string; notes?: string; source_url?: string; } // Patterns that indicate eligibility/requirements rather than dates const NOTE_PATTERNS = [ /illinois/i, /resident/i, /\bID\b/, /required/i, /must\b/i, /free for/i, /tickets/i, /registration/i, /zip code/i, /proof/i, ]; // Patterns that indicate date information const DATE_PATTERNS = [ /january|february|march|april|may|june|july|august|september|october|november|december/i, /\d{1,2}(st|nd|rd|th)?/, /every\s+(monday|tuesday|wednesday|thursday|friday|saturday|sunday)/i, /weekdays?/i, /through|to|-/, ]; function classifyLine(line: string): "date" | "note" { const noteScore = NOTE_PATTERNS.filter((p) => p.test(line)).length; const dateScore = DATE_PATTERNS.filter((p) => p.test(line)).length; // If it looks more like a note, classify as note if (noteScore > dateScore) return "note"; // If it has date indicators, classify as date if (dateScore > 0) return "date"; // Default to note (safer to not generate bad date entries) return "note"; } export async function normalizeDates( rawData: RawFreeDayData[], ): Promise<FreeDayEntry[]> { const results: FreeDayEntry[] = []; for (const item of rawData) { // Pass 1: Classify all lines const dateLines: string[] = []; const noteLines: string[] = []; for (const line of item.rawLines) { if (classifyLine(line) === "date") { dateLines.push(line); } else { noteLines.push(line); } } // Aggregate notes for this institution const aggregatedNotes = noteLines.join("; ") || undefined; // Pass 2: Parse date lines for (const dateLine of dateLines) { // Try chrono-node parsing const parsed = chrono.parse(dateLine); if (parsed.length > 0) { // Extract date range if present, expand into individual dates // Add aggregatedNotes to each entry } else if (/every/i.test(dateLine)) { // Handle recurring patterns const expanded = expandPattern(dateLine, item.institution, aggregatedNotes); results.push(...expanded); } else { // Log unparseable entry for manual review console.warn("Could not parse date line:", dateLine); } } } return results; } // Expand "every Wednesday" into next 12 months of Wednesdays function expandPattern( pattern: string, institution: string, notes?: string, ): FreeDayEntry[] { // Parse day of week // Generate dates for next 12 months // Return array of entries with notes attached }

Step 4: Scraper Orchestrator (scraper cron val)

Schedule: Monthly (first Monday of each month)

Logic:

import { scrapeChooseChicago } from "./scrapeChooseChicago"; import { normalizeDates } from "./utils"; import { loadToSQLite } from "./db"; export default async function () { try { // Run all scrapers const rawData = await scrapeChooseChicago(); // Normalize dates const normalized = await normalizeDates(rawData); // Load to SQLite await loadToSQLite(normalized, "choose-chicago"); console.log(`Scraped ${normalized.length} free day entries`); } catch (error) { console.error("Scraper failed:", error); // Optional: send alert/email } }

Future expansion:

  • Add more scraper functions (Loop Chicago, individual museums)
  • Call them all in sequence
  • Each writes to same scraped_free_days table with different source_url

Step 5: Bluesky Bot (bot cron val)

Schedule: Every Monday at 8am Central Time

Note: Val Town cron schedules may run in UTC. To run at 8am Central:

  • Standard Time (CST, UTC-6): Use 0 14 * * 1 (14:00 UTC)
  • Daylight Time (CDT, UTC-5): Use 0 13 * * 1 (13:00 UTC)
  • Or check Val Town docs for timezone configuration options

Dependencies:

  • @atproto/api (Bluesky SDK)

Logic:

import { BskyAgent } from "@atproto/api"; import { getUpcomingWeek, getLatestScrapeTime } from "./db"; const agent = new BskyAgent({ service: "https://bsky.social" }); export default async function () { // Login to Bluesky await agent.login({ identifier: Deno.env.get("BLUESKY_HANDLE"), password: Deno.env.get("BLUESKY_PASSWORD"), }); // Get upcoming free days const freeDays = await getUpcomingWeek(); // Guard: Don't post if no data available if (freeDays.length === 0) { console.warn("No free days found for upcoming week - skipping post"); return; } // Optional: Check data freshness const lastScrape = await getLatestScrapeTime(); const daysSinceUpdate = lastScrape ? Math.floor((Date.now() - new Date(lastScrape).getTime()) / (1000 * 60 * 60 * 24)) : null; // Format message const message = formatMessage(freeDays, daysSinceUpdate); // Post to Bluesky await agent.post({ text: message }); } function formatMessage(freeDays: FreeDayEntry[], daysSinceUpdate?: number | null): string { // Group by institution // Format as readable list // Add footer with data source link let footer = "Data: github.com/your-repo"; if (daysSinceUpdate && daysSinceUpdate > 14) { footer = `⚠️ Data last updated ${daysSinceUpdate} days ago\n` + footer; } return ` 🏛️ Free museums in Chicago this week: • Art Institute: Mon-Fri (weekdays, 11am-close) • Field Museum: Wednesday • MCA: Tuesday evenings 5-9pm +${freeDays.length - 3} more → [link to API] ${footer} `.trim(); }

Post format considerations:

  • Keep under 300 characters for main post
  • Use thread if >5 museums
  • Link to full calendar/API
  • Emoji for visual interest

Step 6: API Endpoint (api HTTP val)

Dependencies:

  • hono (lightweight web framework)

Routes:

import { Hono } from 'hono'; import { getFreeDays, getUpcomingWeek } from './db'; const app = new Hono(); // Get free days for date range app.get('/free-days', async (c) => { const start = c.req.query('start'); // YYYY-MM-DD const end = c.req.query('end'); // YYYY-MM-DD if (!start || !end) { return c.json({ error: 'start and end dates required' }, 400); } const data = await getFreeDays(start, end); return c.json({ data, count: data.length }); }); // Get upcoming week (convenience endpoint) app.get('/upcoming', async (c) => { const data = await getUpcomingWeek(); return c.json({ data, count: data.length }); }); // Get all institutions (for filtering) app.get('/institutions', async (c) => { // Query distinct institutions from both tables return c.json({ institutions: [...] }); }); export default app.fetch;

Future additions:

  • /ics endpoint to generate iCal feed
  • /institution/:name to filter by museum
  • /today for current day only

Environment Variables

Set these in Val Town secrets:

BLUESKY_HANDLE=your-handle.bsky.social
BLUESKY_PASSWORD=your-app-password

Testing Strategy

Phase 1: Database

  1. Create db val
  2. Run initDatabase()
  3. Manually insert test data
  4. Verify queries work

Phase 2: Scraper + Normalizer

  1. Run scrapeChooseChicago() manually
  2. Inspect raw output
  3. Pass to normalizeDates()
  4. Verify date parsing quality
  5. Check for unparseable entries (aim for <20% failure rate)

Phase 3: Integration

  1. Run full scraper → normalizer → loader pipeline
  2. Query database, verify data looks correct
  3. Add manual overrides for failed parsing
  4. Re-query, confirm overrides work

Phase 4: Bot

  1. Test Bluesky post with hardcoded message
  2. Test with real database query
  3. Verify formatting looks good on Bluesky
  4. Enable cron schedule

Phase 5: API

  1. Test each endpoint locally
  2. Verify JSON responses
  3. Test date range edge cases
  4. Deploy and test public URL

Success Criteria

Minimum Viable Product (MVP)

  • ✅ Scraper runs monthly without errors
  • ✅ Database contains ≥15 Chicago institutions
  • ✅ Bot posts every Monday at 8am
  • ✅ API returns valid JSON for date ranges
  • ✅ Manual intervention <1 hour/month

Stretch Goals

  • Add Loop Chicago scraper
  • Add individual museum scrapers for missing institutions
  • Implement ICS calendar feed endpoint
  • Support filtering by neighborhood/museum type
  • Community contributions (PR process for corrections)

Known Limitations & Mitigations

LimitationImpactMitigation
Choose Chicago updates irregularlyStale dataRun scraper monthly, add manual overrides
Date parsing ~80% accurateMissing/incorrect datesManual review process, improve parser over time
Proof-of-residency rules changeIncorrect infoVersion in git, add notes field
Bluesky API rate limitsPosting failuresRespect rate limits, add retry logic
SQLite size limitsUnlikely (450 rows/year)Monitor, could archive old dates if needed

Maintenance Schedule

Monthly:

  • Review scraper logs for failures
  • Check for unparseable dates
  • Add manual overrides as needed

Quarterly:

  • Audit data quality against official museum sites
  • Update parser for new date formats
  • Review Bluesky engagement metrics

Annually:

  • Update recurring patterns (if museums change schedules)
  • Archive old dates (optional)
  • Expand to new data sources if successful

Future Enhancements

  1. Multi-city support: Expand to other cities with free museum programs
  2. ICS calendar feed: Users can subscribe in Google/Apple Calendar
  3. Embeddable widget: Iframe for community sites
  4. SMS notifications: Text alerts for favorite museums
  5. Accessibility filters: Filter by wheelchair access, sensory-friendly hours
  6. Community contributions: GitHub repo for data corrections/additions

Technical Debt to Avoid

  • ❌ Don't pre-optimize date parsing - start simple, improve as needed
  • ❌ Don't build complex admin UI - Val Town console + SQLite tool is enough
  • ❌ Don't add authentication to API - keep it open (rate limit if abused)
  • ❌ Don't overthink schema - simple tables work fine at this scale

Resources

Next Steps

Start with Step 1: Database Setup - create the db val and initialize the schema. Once tables exist, you can test the entire pipeline with manual data before building the scraper.