Chicago Free Museum Days - Val Town Implementation Spec

Overview

A Bluesky bot that posts weekly updates about free museum days in Chicago, built on Val Town with SQLite storage, automated scraping, and a public API.

Architecture

Val Town Structure

Vals:
├── scrapeChooseChicago    (function - extracts data from Choose Chicago)
├── scraper                (cron - orchestrates all scraper functions)
├── utils                  (function - normalizeDates, date expansion helpers)
├── db                     (function - loadToSQLite, query functions)
├── bot                    (cron - postToBluesky, runs Monday mornings)
└── api                    (HTTP - Hono API for data access)

SQLite Database:
├── scraped_free_days      (auto-populated by scrapers)
├── free_days_overrides    (manual corrections/additions)
└── Union query merges both (overrides win on conflicts)

Data Flow

Choose Chicago (HTML)
    ↓
scrapeChooseChicago() → [{name, rawDates[], url, notes}]
    ↓
normalizeDates() → [{institution, date, time, proof_required, notes}]
    ↓
loadToSQLite() → scraped_free_days table
    ↓
Query (union with overrides)
    ↓
[Branch 1] postToBluesky() → Bluesky post
[Branch 2] API endpoint → JSON response

Database Schema

Table: `scraped_free_days`

Auto-populated by scrapers. Replaced on each scraper run.

CREATE TABLE IF NOT EXISTS scraped_free_days (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  institution TEXT NOT NULL,
  date TEXT NOT NULL,              -- ISO format YYYY-MM-DD
  time TEXT,                        -- e.g., "9am-5pm", "11am-close"
  proof_required TEXT,              -- e.g., "Illinois ID required", "ZIP code verification"
  notes TEXT,                       -- Additional info (holidays, registration required, etc.)
  source_url TEXT,                  -- Where this data was scraped from
  scraped_at TEXT NOT NULL,         -- ISO timestamp of scrape
  UNIQUE(institution, date)         -- Prevent duplicate entries
);

CREATE INDEX idx_date ON scraped_free_days(date);
CREATE INDEX idx_institution ON scraped_free_days(institution);

Table: `free_days_overrides`

Manual corrections for when scrapers fail or data needs fixing.

CREATE TABLE IF NOT EXISTS free_days_overrides (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  institution TEXT NOT NULL,
  date TEXT NOT NULL,              -- ISO format YYYY-MM-DD
  time TEXT,
  proof_required TEXT,
  notes TEXT,
  override_reason TEXT,            -- Why this override exists
  created_at TEXT NOT NULL,        -- When override was added
  UNIQUE(institution, date)
);

CREATE INDEX idx_override_date ON free_days_overrides(date);

Query Logic

When querying for free days, always union both tables with overrides taking precedence:

-- Get free days for a date range
SELECT institution, date, time, proof_required, notes
FROM (
  SELECT institution, date, time, proof_required, notes, 1 as priority
  FROM free_days_overrides
  WHERE date BETWEEN ? AND ?

  UNION ALL

  SELECT institution, date, time, proof_required, notes, 2 as priority
  FROM scraped_free_days
  WHERE date BETWEEN ? AND ?
)
GROUP BY institution, date
HAVING priority = MIN(priority)
ORDER BY date, institution;

Implementation Steps

Step 1: Database Setup (`db` val)

Create the db val with:

Functions:

initDatabase() - Create tables and indexes
loadToSQLite(data, source) - Insert scraped data
getFreeDays(startDate, endDate) - Query with override logic
addOverride(institution, date, details) - Manual correction helper
getUpcomingWeek() - Helper for bot (next 7 days from today)

Implementation:

import { sqlite } from "https://esm.town/v/std/sqlite";

// Initialize tables
export async function initDatabase() {
  // Create scraped_free_days table
  // Create free_days_overrides table
  // Create indexes
}

// Load scraped data (replaces old scraped data)
export async function loadToSQLite(data: Array<FreeDayEntry>, source: string) {
  // Begin transaction
  // Delete old entries from same source
  // Insert new entries
  // Commit
}

// Query with override logic
export async function getFreeDays(startDate: string, endDate: string) {
  // Union query as shown above
}

// Helper for bot
export async function getUpcomingWeek() {
  const today = new Date().toISOString().split('T')[0];
  const nextWeek = new Date(Date.now() + 7 * 24 * 60 * 60 * 1000)
    .toISOString().split('T')[0];
  return getFreeDays(today, nextWeek);
}

Step 2: Scraper (`scrapeChooseChicago` val)

Dependencies:

cheerio (HTML parsing)

Function signature:

export interface RawFreeDayData {
  institution: string;
  rawDates: string[];  // Unparsed date strings from HTML
  url?: string;
  notes?: string;
}

export async function scrapeChooseChicago(): Promise<RawFreeDayData[]>

Scraping logic:

Fetch HTML from Choose Chicago
Parse structure:
- Find H3 headings (institution names)
- Extract following <ul><li> lists (date strings)
Return structured data

Example HTML structure to parse:

<h3>Art Institute of Chicago</h3>
<ul>
  <li>January 5 - February 28 (weekdays only, 11am-close)</li>
  <li>Illinois residents with ID</li>
</ul>

Pseudocode:

const $ = cheerio.load(html);
const results = [];

$('h3').each((i, el) => {
  const institution = $(el).text().trim();
  const dates = [];

  // Find following ul/li elements
  $(el).nextUntil('h3', 'ul').find('li').each((j, li) => {
    dates.push($(li).text().trim());
  });

  results.push({ institution, rawDates: dates });
});

return results;

Step 3: Utilities (`utils` val)

Dependencies:

chrono-node (natural language date parsing)

Functions:

normalizeDates(rawData) - Parse raw strings into structured dates
expandPattern(pattern, startDate, endDate) - Expand "every Wednesday" into dates
parseTimeRange(timeStr) - Normalize time formats

Key logic:

export interface FreeDayEntry {
  institution: string;
  date: string;        // ISO YYYY-MM-DD
  time?: string;
  proof_required?: string;
  notes?: string;
}

export async function normalizeDates(
  rawData: RawFreeDayData[]
): Promise<FreeDayEntry[]> {
  const results: FreeDayEntry[] = [];

  for (const item of rawData) {
    for (const rawDate of item.rawDates) {
      // Try chrono-node parsing
      const parsed = chrono.parse(rawDate);

      if (parsed.length > 0) {
        // Extract date range if present
        // Expand into individual dates
        // Add to results
      } else if (rawDate.includes('every')) {
        // Handle recurring patterns
        const expanded = expandPattern(rawDate);
        results.push(...expanded);
      } else {
        // Log unparseable entry for manual review
        console.warn('Could not parse:', rawDate);
      }
    }
  }

  return results;
}

// Expand "every Wednesday" into next 12 months of Wednesdays
function expandPattern(pattern: string): FreeDayEntry[] {
  // Parse day of week
  // Generate dates for next 12 months
  // Return array of entries
}

Step 4: Scraper Orchestrator (`scraper` cron val)

Schedule: Monthly (first Monday of each month)

Logic:

import { scrapeChooseChicago } from "./scrapeChooseChicago";
import { normalizeDates } from "./utils";
import { loadToSQLite } from "./db";

export default async function() {
  try {
    // Run all scrapers
    const rawData = await scrapeChooseChicago();

    // Normalize dates
    const normalized = await normalizeDates(rawData);

    // Load to SQLite
    await loadToSQLite(normalized, 'choose-chicago');

    console.log(`Scraped ${normalized.length} free day entries`);
  } catch (error) {
    console.error('Scraper failed:', error);
    // Optional: send alert/email
  }
}

Future expansion:

Add more scraper functions (Loop Chicago, individual museums)
Call them all in sequence
Each writes to same scraped_free_days table with different source_url

Step 5: Bluesky Bot (`bot` cron val)

Schedule: Every Monday at 8am Central Time

Dependencies:

@atproto/api (Bluesky SDK)

Logic:

import { BskyAgent } from '@atproto/api';
import { getUpcomingWeek } from './db';

const agent = new BskyAgent({ service: 'https://bsky.social' });

export default async function() {
  // Login to Bluesky
  await agent.login({
    identifier: process.env.BLUESKY_HANDLE,
    password: process.env.BLUESKY_PASSWORD
  });

  // Get upcoming free days
  const freeDays = await getUpcomingWeek();

  // Format message
  const message = formatMessage(freeDays);

  // Post to Bluesky
  await agent.post({ text: message });
}

function formatMessage(freeDays: FreeDayEntry[]): string {
  // Group by institution
  // Format as readable list
  // Add footer with data source link

  return `
🏛️ Free museums in Chicago this week:

• Art Institute: Mon-Fri (weekdays, 11am-close)
• Field Museum: Wednesday
• MCA: Tuesday evenings 5-9pm

+${freeDays.length - 3} more → [link to API]

Data: github.com/your-repo
  `.trim();
}

Post format considerations:

Keep under 300 characters for main post
Use thread if >5 museums
Link to full calendar/API
Emoji for visual interest

Step 6: API Endpoint (`api` HTTP val)

Dependencies:

hono (lightweight web framework)

Routes:

import { Hono } from 'hono';
import { getFreeDays, getUpcomingWeek } from './db';

const app = new Hono();

// Get free days for date range
app.get('/free-days', async (c) => {
  const start = c.req.query('start'); // YYYY-MM-DD
  const end = c.req.query('end');     // YYYY-MM-DD

  if (!start || !end) {
    return c.json({ error: 'start and end dates required' }, 400);
  }

  const data = await getFreeDays(start, end);
  return c.json({ data, count: data.length });
});

// Get upcoming week (convenience endpoint)
app.get('/upcoming', async (c) => {
  const data = await getUpcomingWeek();
  return c.json({ data, count: data.length });
});

// Get all institutions (for filtering)
app.get('/institutions', async (c) => {
  // Query distinct institutions from both tables
  return c.json({ institutions: [...] });
});

export default app.fetch;

Future additions:

/ics endpoint to generate iCal feed
/institution/:name to filter by museum
/today for current day only

Environment Variables

Set these in Val Town secrets:

BLUESKY_HANDLE=your-handle.bsky.social
BLUESKY_PASSWORD=your-app-password

Testing Strategy

Phase 1: Database

Create db val
Run initDatabase()
Manually insert test data
Verify queries work

Phase 2: Scraper + Normalizer

Run scrapeChooseChicago() manually
Inspect raw output
Pass to normalizeDates()
Verify date parsing quality
Check for unparseable entries (aim for <20% failure rate)

Phase 3: Integration

Run full scraper → normalizer → loader pipeline
Query database, verify data looks correct
Add manual overrides for failed parsing
Re-query, confirm overrides work

Phase 4: Bot

Test Bluesky post with hardcoded message
Test with real database query
Verify formatting looks good on Bluesky
Enable cron schedule

Phase 5: API

Test each endpoint locally
Verify JSON responses
Test date range edge cases
Deploy and test public URL

Success Criteria

Minimum Viable Product (MVP)

✅ Scraper runs monthly without errors
✅ Database contains ≥15 Chicago institutions
✅ Bot posts every Monday at 8am
✅ API returns valid JSON for date ranges
✅ Manual intervention <1 hour/month

Stretch Goals

Add Loop Chicago scraper
Add individual museum scrapers for missing institutions
Implement ICS calendar feed endpoint
Support filtering by neighborhood/museum type
Community contributions (PR process for corrections)

Known Limitations & Mitigations

Limitation	Impact	Mitigation
Choose Chicago updates irregularly	Stale data	Run scraper monthly, add manual overrides
Date parsing ~80% accurate	Missing/incorrect dates	Manual review process, improve parser over time
Proof-of-residency rules change	Incorrect info	Version in git, add notes field
Bluesky API rate limits	Posting failures	Respect rate limits, add retry logic
SQLite size limits	Unlikely (450 rows/year)	Monitor, could archive old dates if needed

Maintenance Schedule

Monthly:

Review scraper logs for failures
Check for unparseable dates
Add manual overrides as needed

Quarterly:

Audit data quality against official museum sites
Update parser for new date formats
Review Bluesky engagement metrics

Annually:

Update recurring patterns (if museums change schedules)
Archive old dates (optional)
Expand to new data sources if successful

Future Enhancements

Multi-city support: Expand to other cities with free museum programs
ICS calendar feed: Users can subscribe in Google/Apple Calendar
Embeddable widget: Iframe for community sites
SMS notifications: Text alerts for favorite museums
Accessibility filters: Filter by wheelchair access, sensory-friendly hours
Community contributions: GitHub repo for data corrections/additions

Technical Debt to Avoid

❌ Don't pre-optimize date parsing - start simple, improve as needed
❌ Don't build complex admin UI - Val Town console + SQLite tool is enough
❌ Don't add authentication to API - keep it open (rate limit if abused)
❌ Don't overthink schema - simple tables work fine at this scale

Resources

Next Steps

Start with Step 1: Database Setup - create the db val and initialize the schema. Once tables exist, you can test the entire pipeline with manual data before building the scraper.

nbbaier

chicago-resident-days

Chicago Free Museum Days - Val Town Implementation Spec

Overview

Architecture

Val Town Structure

Data Flow

Database Schema

Table: `scraped_free_days`

Table: `free_days_overrides`

Query Logic

Implementation Steps

Step 1: Database Setup (`db` val)

Step 2: Scraper (`scrapeChooseChicago` val)

Step 3: Utilities (`utils` val)

Step 4: Scraper Orchestrator (`scraper` cron val)

Step 5: Bluesky Bot (`bot` cron val)

Step 6: API Endpoint (`api` HTTP val)

Environment Variables

Testing Strategy

Phase 1: Database

Phase 2: Scraper + Normalizer

Phase 3: Integration

Phase 4: Bot

Phase 5: API

Success Criteria

Minimum Viable Product (MVP)

Stretch Goals

Known Limitations & Mitigations

Maintenance Schedule

Future Enhancements

Technical Debt to Avoid

Resources

Next Steps

nbbaier

chicago-resident-days

Chicago Free Museum Days - Val Town Implementation Spec

Overview

Architecture

Val Town Structure

Data Flow

Database Schema

Table: scraped_free_days

Table: free_days_overrides

Query Logic

Implementation Steps

Step 1: Database Setup (db val)

Step 2: Scraper (scrapeChooseChicago val)

Step 3: Utilities (utils val)

Step 4: Scraper Orchestrator (scraper cron val)

Step 5: Bluesky Bot (bot cron val)

Step 6: API Endpoint (api HTTP val)

Environment Variables

Testing Strategy

Phase 1: Database

Phase 2: Scraper + Normalizer

Phase 3: Integration

Phase 4: Bot

Phase 5: API

Success Criteria

Minimum Viable Product (MVP)

Stretch Goals

Known Limitations & Mitigations

Maintenance Schedule

Future Enhancements

Technical Debt to Avoid

Resources

Next Steps

Table: `scraped_free_days`

Table: `free_days_overrides`

Step 1: Database Setup (`db` val)

Step 2: Scraper (`scrapeChooseChicago` val)

Step 3: Utilities (`utils` val)

Step 4: Scraper Orchestrator (`scraper` cron val)

Step 5: Bluesky Bot (`bot` cron val)

Step 6: API Endpoint (`api` HTTP val)