Chicago Free Museum Days - Val Town Implementation Spec

Overview

A Bluesky bot that posts weekly updates about free museum days in Chicago, built on Val Town with SQLite storage, automated scraping, and a public API.

Architecture

Val Town Structure

Vals:
├── scrapeChooseChicago    (function - extracts data from Choose Chicago)
├── scraper                (cron - orchestrates all scraper functions)
├── utils                  (function - normalizeDates, date expansion helpers)
├── db                     (function - loadToSQLite, query functions)
├── bot                    (cron - postToBluesky, runs Monday mornings)
└── api                    (HTTP - Hono API for data access)

SQLite Database:
├── scraped_free_days      (auto-populated by scrapers)
├── free_days_overrides    (manual corrections/additions)
└── Union query merges both (overrides win on conflicts)

Data Flow

Choose Chicago (HTML)
    ↓
scrapeChooseChicago() → [{name, rawDates[], url, notes}]
    ↓
normalizeDates() → [{institution, date, time, proof_required, notes}]
    ↓
loadToSQLite() → scraped_free_days table
    ↓
Query (union with overrides)
    ↓
[Branch 1] postToBluesky() → Bluesky post
[Branch 2] API endpoint → JSON response

Database Schema

Table: `scraped_free_days`

Auto-populated by scrapers. Replaced on each scraper run.

CREATE TABLE IF NOT EXISTS scraped_free_days (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  institution TEXT NOT NULL,
  institution_key TEXT NOT NULL,    -- Slugified institution name for matching
  date TEXT NOT NULL,               -- ISO format YYYY-MM-DD
  time TEXT,                        -- e.g., "9am-5pm", "11am-close"
  proof_required TEXT,              -- e.g., "Illinois ID required", "ZIP code verification"
  notes TEXT,                       -- Additional info (holidays, registration required, etc.)
  source TEXT NOT NULL,             -- Source identifier (e.g., "choose-chicago")
  source_url TEXT,                  -- Where this data was scraped from
  scraped_at TEXT NOT NULL,         -- ISO timestamp of scrape
  UNIQUE(institution_key, date)     -- Prevent duplicate entries per institution/date
);

CREATE INDEX idx_scraped_date ON scraped_free_days(date);
CREATE INDEX idx_scraped_institution_key ON scraped_free_days(institution_key);
CREATE INDEX idx_scraped_source ON scraped_free_days(source);

Table: `free_days_overrides`

Manual corrections for when scrapers fail or data needs fixing. Can also suppress incorrect scraped data.

CREATE TABLE IF NOT EXISTS free_days_overrides (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  institution TEXT NOT NULL,
  institution_key TEXT NOT NULL,   -- Slugified institution name for matching
  date TEXT NOT NULL,              -- ISO format YYYY-MM-DD
  time TEXT,
  proof_required TEXT,
  notes TEXT,
  is_cancelled INTEGER DEFAULT 0,  -- Set to 1 to suppress a scraped entry
  override_reason TEXT,            -- Why this override exists
  created_at TEXT NOT NULL,        -- When override was added
  UNIQUE(institution_key, date)
);

CREATE INDEX idx_override_date ON free_days_overrides(date);
CREATE INDEX idx_override_institution_key ON free_days_overrides(institution_key);

Query Logic

When querying for free days, overrides take precedence over scraped data. Use a LEFT JOIN anti-pattern for deterministic results (the previous GROUP BY approach was non-deterministic in SQLite):

-- Get free days for a date range (overrides win, cancelled entries excluded)
SELECT s.institution, s.date, s.time, s.proof_required, s.notes
FROM scraped_free_days s
LEFT JOIN free_days_overrides o
  ON o.institution_key = s.institution_key AND o.date = s.date
WHERE s.date BETWEEN ? AND ?
  AND o.id IS NULL  -- No override exists for this entry

UNION ALL

SELECT o.institution, o.date, o.time, o.proof_required, o.notes
FROM free_days_overrides o
WHERE o.date BETWEEN ? AND ?
  AND COALESCE(o.is_cancelled, 0) = 0  -- Exclude cancelled entries

ORDER BY date, institution;

Implementation Steps

Step 1: Database Setup (`db` val)

Create the db val with:

Functions:

initDatabase() - Create tables and indexes
loadToSQLite(data, source) - Insert scraped data (deletes old entries from same source first)
getFreeDays(startDate, endDate) - Query with override logic
addOverride(institution, date, details) - Manual correction helper
getUpcomingWeek() - Helper for bot (next 7 days from today)
getLatestScrapeTime() - Get timestamp of most recent scrape (for staleness check)
slugify(name) - Generate institution_key from institution name
chicagoDate(date?) - Get ISO date string in America/Chicago timezone

Implementation:

import { sqlite } from "https://esm.town/v/std/sqlite";

// Get ISO date string in Chicago timezone (avoids UTC date offset issues)
export function chicagoDate(d = new Date()): string {
   const parts = new Intl.DateTimeFormat("en-CA", {
      timeZone: "America/Chicago",
      year: "numeric",
      month: "2-digit",
      day: "2-digit",
   }).formatToParts(d);
   const y = parts.find((p) => p.type === "year")!.value;
   const m = parts.find((p) => p.type === "month")!.value;
   const day = parts.find((p) => p.type === "day")!.value;
   return `${y}-${m}-${day}`;
}

// Generate a stable key for matching institutions across sources/overrides
export function slugify(name: string): string {
   return name
      .toLowerCase()
      .replace(/[^a-z0-9]+/g, "-")
      .replace(/^-|-$/g, "");
}

// Initialize tables
export async function initDatabase() {
   // Create scraped_free_days table
   // Create free_days_overrides table
   // Create indexes
}

// Load scraped data (replaces old scraped data from same source)
export async function loadToSQLite(data: Array<FreeDayEntry>, source: string) {
   // Use sqlite.batch() for atomic transaction
   await sqlite.batch([
      { sql: "DELETE FROM scraped_free_days WHERE source = ?", args: [source] },
      ...data.map((entry) => ({
         sql: `INSERT INTO scraped_free_days 
               (institution, institution_key, date, time, proof_required, notes, source, source_url, scraped_at)
               VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`,
         args: [
            entry.institution,
            slugify(entry.institution),
            entry.date,
            entry.time,
            entry.proof_required,
            entry.notes,
            source,
            entry.source_url,
            new Date().toISOString(),
         ],
      })),
   ]);
}

// Query with override logic
export async function getFreeDays(startDate: string, endDate: string) {
   // LEFT JOIN anti-pattern query as shown above
}

// Helper for bot (uses Chicago timezone)
export async function getUpcomingWeek() {
   const today = chicagoDate();
   const nextWeekDate = new Date();
   nextWeekDate.setDate(nextWeekDate.getDate() + 7);
   const nextWeek = chicagoDate(nextWeekDate);
   return getFreeDays(today, nextWeek);
}

// Check data freshness for bot staleness warning
export async function getLatestScrapeTime(): Promise<string | null> {
   const result = await sqlite.execute(
      "SELECT MAX(scraped_at) as latest FROM scraped_free_days",
   );
   return result.rows[0]?.latest ?? null;
}

Step 2: Scraper (`scrapeChooseChicago` val)

Dependencies:

cheerio (HTML parsing)

Function signature:

export interface RawFreeDayData {
   institution: string;
   rawLines: string[]; // All lines from HTML (dates AND eligibility notes mixed)
   url?: string;
}

export async function scrapeChooseChicago(): Promise<RawFreeDayData[]>;

Scraping logic:

Fetch HTML from Choose Chicago (with timeout and User-Agent)
Parse structure:
- Find H3 headings (institution names)
- Extract following <ul><li> lists (all text, not just dates)
Return structured data

Important: Choose Chicago mixes date info and eligibility notes in the same <li> list. The normalizer will classify each line.

Example HTML structure to parse:

<h3>Art Institute of Chicago</h3>
<ul>
   <li>January 5 - February 28 (weekdays only, 11am-close)</li>
   <li>Illinois residents with ID</li>
</ul>

Pseudocode:

const SOURCE_URL =
   "https://www.choosechicago.com/articles/museums-art/free-museum-days-in-chicago/";

// Fetch with timeout and polite User-Agent
const controller = new AbortController();
setTimeout(() => controller.abort(), 15_000);
const response = await fetch(SOURCE_URL, {
   signal: controller.signal,
   headers: { "User-Agent": "chicago-free-museum-days-bot/1.0" },
});
const html = await response.text();

const $ = cheerio.load(html);
const results = [];

$("h3").each((i, el) => {
   const institution = $(el).text().trim();
   const lines = [];

   // Find following ul/li elements (contains dates AND eligibility notes)
   $(el)
      .nextUntil("h3", "ul")
      .find("li")
      .each((j, li) => {
         lines.push($(li).text().trim());
      });

   results.push({ institution, rawLines: lines, url: SOURCE_URL });
});

return results;

Step 3: Utilities (`utils` val)

Dependencies:

chrono-node (natural language date parsing)

Functions:

classifyLine(line) - Determine if a line is date-ish or note-ish
normalizeDates(rawData) - Parse raw strings into structured dates
expandPattern(pattern, startDate, endDate) - Expand "every Wednesday" into dates
parseTimeRange(timeStr) - Normalize time formats

Key logic:

Choose Chicago mixes date info and eligibility notes in the same <li> list. The normalizer uses a two-pass approach:

Classify each line as date-ish or note-ish
Aggregate note-ish lines into proof_required / notes for the institution
Expand date-ish lines into individual FreeDayEntry records

export interface FreeDayEntry {
   institution: string;
   date: string; // ISO YYYY-MM-DD
   time?: string;
   proof_required?: string;
   notes?: string;
   source_url?: string;
}

// Patterns that indicate eligibility/requirements rather than dates
const NOTE_PATTERNS = [
   /illinois/i,
   /resident/i,
   /\bID\b/,
   /required/i,
   /must\b/i,
   /free for/i,
   /tickets/i,
   /registration/i,
   /zip code/i,
   /proof/i,
];

// Patterns that indicate date information
const DATE_PATTERNS = [
   /january|february|march|april|may|june|july|august|september|october|november|december/i,
   /\d{1,2}(st|nd|rd|th)?/,
   /every\s+(monday|tuesday|wednesday|thursday|friday|saturday|sunday)/i,
   /weekdays?/i,
   /through|to|-/,
];

function classifyLine(line: string): "date" | "note" {
   const noteScore = NOTE_PATTERNS.filter((p) => p.test(line)).length;
   const dateScore = DATE_PATTERNS.filter((p) => p.test(line)).length;

   // If it looks more like a note, classify as note
   if (noteScore > dateScore) return "note";
   // If it has date indicators, classify as date
   if (dateScore > 0) return "date";
   // Default to note (safer to not generate bad date entries)
   return "note";
}

export async function normalizeDates(
   rawData: RawFreeDayData[],
): Promise<FreeDayEntry[]> {
   const results: FreeDayEntry[] = [];

   for (const item of rawData) {
      // Pass 1: Classify all lines
      const dateLines: string[] = [];
      const noteLines: string[] = [];

      for (const line of item.rawLines) {
         if (classifyLine(line) === "date") {
            dateLines.push(line);
         } else {
            noteLines.push(line);
         }
      }

      // Aggregate notes for this institution
      const aggregatedNotes = noteLines.join("; ") || undefined;

      // Pass 2: Parse date lines
      for (const dateLine of dateLines) {
         // Try chrono-node parsing
         const parsed = chrono.parse(dateLine);

         if (parsed.length > 0) {
            // Extract date range if present, expand into individual dates
            // Add aggregatedNotes to each entry
         } else if (/every/i.test(dateLine)) {
            // Handle recurring patterns
            const expanded = expandPattern(dateLine, item.institution, aggregatedNotes);
            results.push(...expanded);
         } else {
            // Log unparseable entry for manual review
            console.warn("Could not parse date line:", dateLine);
         }
      }
   }

   return results;
}

// Expand "every Wednesday" into next 12 months of Wednesdays
function expandPattern(
   pattern: string,
   institution: string,
   notes?: string,
): FreeDayEntry[] {
   // Parse day of week
   // Generate dates for next 12 months
   // Return array of entries with notes attached
}

Step 4: Scraper Orchestrator (`scraper` cron val)

Schedule: Monthly (first Monday of each month)

Logic:

import { scrapeChooseChicago } from "./scrapeChooseChicago";
import { normalizeDates } from "./utils";
import { loadToSQLite } from "./db";

export default async function () {
   try {
      // Run all scrapers
      const rawData = await scrapeChooseChicago();

      // Normalize dates
      const normalized = await normalizeDates(rawData);

      // Load to SQLite
      await loadToSQLite(normalized, "choose-chicago");

      console.log(`Scraped ${normalized.length} free day entries`);
   } catch (error) {
      console.error("Scraper failed:", error);
      // Optional: send alert/email
   }
}

Future expansion:

Add more scraper functions (Loop Chicago, individual museums)
Call them all in sequence
Each writes to same scraped_free_days table with different source_url

Step 5: Bluesky Bot (`bot` cron val)

Schedule: Every Monday at 8am Central Time

Note: Val Town cron schedules may run in UTC. To run at 8am Central:

Standard Time (CST, UTC-6): Use 0 14 * * 1 (14:00 UTC)
Daylight Time (CDT, UTC-5): Use 0 13 * * 1 (13:00 UTC)
Or check Val Town docs for timezone configuration options

Dependencies:

@atproto/api (Bluesky SDK)

Logic:

import { BskyAgent } from "@atproto/api";
import { getUpcomingWeek, getLatestScrapeTime } from "./db";

const agent = new BskyAgent({ service: "https://bsky.social" });

export default async function () {
   // Login to Bluesky
   await agent.login({
      identifier: Deno.env.get("BLUESKY_HANDLE"),
      password: Deno.env.get("BLUESKY_PASSWORD"),
   });

   // Get upcoming free days
   const freeDays = await getUpcomingWeek();

   // Guard: Don't post if no data available
   if (freeDays.length === 0) {
      console.warn("No free days found for upcoming week - skipping post");
      return;
   }

   // Optional: Check data freshness
   const lastScrape = await getLatestScrapeTime();
   const daysSinceUpdate = lastScrape
      ? Math.floor((Date.now() - new Date(lastScrape).getTime()) / (1000 * 60 * 60 * 24))
      : null;

   // Format message
   const message = formatMessage(freeDays, daysSinceUpdate);

   // Post to Bluesky
   await agent.post({ text: message });
}

function formatMessage(freeDays: FreeDayEntry[], daysSinceUpdate?: number | null): string {
   // Group by institution
   // Format as readable list
   // Add footer with data source link

   let footer = "Data: github.com/your-repo";
   if (daysSinceUpdate && daysSinceUpdate > 14) {
      footer = `⚠️ Data last updated ${daysSinceUpdate} days ago\n` + footer;
   }

   return `
🏛️ Free museums in Chicago this week:

• Art Institute: Mon-Fri (weekdays, 11am-close)
• Field Museum: Wednesday
• MCA: Tuesday evenings 5-9pm

+${freeDays.length - 3} more → [link to API]

${footer}
  `.trim();
}

Post format considerations:

Keep under 300 characters for main post
Use thread if >5 museums
Link to full calendar/API
Emoji for visual interest

Step 6: API Endpoint (`api` HTTP val)

Dependencies:

hono (lightweight web framework)

Routes:

import { Hono } from 'hono';
import { getFreeDays, getUpcomingWeek } from './db';

const app = new Hono();

// Get free days for date range
app.get('/free-days', async (c) => {
  const start = c.req.query('start'); // YYYY-MM-DD
  const end = c.req.query('end');     // YYYY-MM-DD

  if (!start || !end) {
    return c.json({ error: 'start and end dates required' }, 400);
  }

  const data = await getFreeDays(start, end);
  return c.json({ data, count: data.length });
});

// Get upcoming week (convenience endpoint)
app.get('/upcoming', async (c) => {
  const data = await getUpcomingWeek();
  return c.json({ data, count: data.length });
});

// Get all institutions (for filtering)
app.get('/institutions', async (c) => {
  // Query distinct institutions from both tables
  return c.json({ institutions: [...] });
});

export default app.fetch;

Future additions:

/ics endpoint to generate iCal feed
/institution/:name to filter by museum
/today for current day only

Environment Variables

Set these in Val Town secrets:

BLUESKY_HANDLE=your-handle.bsky.social
BLUESKY_PASSWORD=your-app-password

Testing Strategy

Phase 1: Database

Create db val
Run initDatabase()
Manually insert test data
Verify queries work

Phase 2: Scraper + Normalizer

Run scrapeChooseChicago() manually
Inspect raw output
Pass to normalizeDates()
Verify date parsing quality
Check for unparseable entries (aim for <20% failure rate)

Phase 3: Integration

Run full scraper → normalizer → loader pipeline
Query database, verify data looks correct
Add manual overrides for failed parsing
Re-query, confirm overrides work

Phase 4: Bot

Test Bluesky post with hardcoded message
Test with real database query
Verify formatting looks good on Bluesky
Enable cron schedule

Phase 5: API

Test each endpoint locally
Verify JSON responses
Test date range edge cases
Deploy and test public URL

Success Criteria

Minimum Viable Product (MVP)

✅ Scraper runs monthly without errors
✅ Database contains ≥15 Chicago institutions
✅ Bot posts every Monday at 8am
✅ API returns valid JSON for date ranges
✅ Manual intervention <1 hour/month

Stretch Goals

Add Loop Chicago scraper
Add individual museum scrapers for missing institutions
Implement ICS calendar feed endpoint
Support filtering by neighborhood/museum type
Community contributions (PR process for corrections)

Known Limitations & Mitigations

Limitation	Impact	Mitigation
Choose Chicago updates irregularly	Stale data	Run scraper monthly, add manual overrides
Date parsing ~80% accurate	Missing/incorrect dates	Manual review process, improve parser over time
Proof-of-residency rules change	Incorrect info	Version in git, add notes field
Bluesky API rate limits	Posting failures	Respect rate limits, add retry logic
SQLite size limits	Unlikely (450 rows/year)	Monitor, could archive old dates if needed

Maintenance Schedule

Monthly:

Review scraper logs for failures
Check for unparseable dates
Add manual overrides as needed

Quarterly:

Audit data quality against official museum sites
Update parser for new date formats
Review Bluesky engagement metrics

Annually:

Update recurring patterns (if museums change schedules)
Archive old dates (optional)
Expand to new data sources if successful

Future Enhancements

Multi-city support: Expand to other cities with free museum programs
ICS calendar feed: Users can subscribe in Google/Apple Calendar
Embeddable widget: Iframe for community sites
SMS notifications: Text alerts for favorite museums
Accessibility filters: Filter by wheelchair access, sensory-friendly hours
Community contributions: GitHub repo for data corrections/additions

Technical Debt to Avoid

❌ Don't pre-optimize date parsing - start simple, improve as needed
❌ Don't build complex admin UI - Val Town console + SQLite tool is enough
❌ Don't add authentication to API - keep it open (rate limit if abused)
❌ Don't overthink schema - simple tables work fine at this scale

Resources

Next Steps

Start with Step 1: Database Setup - create the db val and initialize the schema. Once tables exist, you can test the entire pipeline with manual data before building the scraper.

Chicago Free Museum Days - Val Town Implementation Spec

Overview

Architecture

Val Town Structure

Data Flow

Database Schema

Table: scraped_free_days

Table: free_days_overrides

Query Logic

Implementation Steps

Step 1: Database Setup (db val)

Step 2: Scraper (scrapeChooseChicago val)

Step 3: Utilities (utils val)

Step 4: Scraper Orchestrator (scraper cron val)

Step 5: Bluesky Bot (bot cron val)

Step 6: API Endpoint (api HTTP val)

Environment Variables

Testing Strategy

Phase 1: Database

Phase 2: Scraper + Normalizer

Phase 3: Integration

Phase 4: Bot

Phase 5: API

Success Criteria

Minimum Viable Product (MVP)

Stretch Goals

Known Limitations & Mitigations

Maintenance Schedule

Future Enhancements

Technical Debt to Avoid

Resources

Next Steps

Table: `scraped_free_days`

Table: `free_days_overrides`

Step 1: Database Setup (`db` val)

Step 2: Scraper (`scrapeChooseChicago` val)

Step 3: Utilities (`utils` val)

Step 4: Scraper Orchestrator (`scraper` cron val)

Step 5: Bluesky Bot (`bot` cron val)

Step 6: API Endpoint (`api` HTTP val)