A Bluesky bot that posts weekly updates about free museum days in Chicago, built on Val Town with SQLite storage, automated scraping, and a public API.
Vals:
├── scrapeChooseChicago (function - extracts data from Choose Chicago)
├── scraper (cron - orchestrates all scraper functions)
├── utils (function - normalizeDates, date expansion helpers)
├── db (function - loadToSQLite, query functions)
├── bot (cron - postToBluesky, runs Monday mornings)
└── api (HTTP - Hono API for data access)
SQLite Database:
├── scraped_free_days (auto-populated by scrapers)
├── free_days_overrides (manual corrections/additions)
└── Union query merges both (overrides win on conflicts)
Choose Chicago (HTML)
↓
scrapeChooseChicago() → [{name, rawDates[], url, notes}]
↓
normalizeDates() → [{institution, date, time, proof_required, notes}]
↓
loadToSQLite() → scraped_free_days table
↓
Query (union with overrides)
↓
[Branch 1] postToBluesky() → Bluesky post
[Branch 2] API endpoint → JSON response
Auto-populated by scrapers. Replaced on each scraper run.
CREATE TABLE IF NOT EXISTS scraped_free_days (
id INTEGER PRIMARY KEY AUTOINCREMENT,
institution TEXT NOT NULL,
institution_key TEXT NOT NULL, -- Slugified institution name for matching
date TEXT NOT NULL, -- ISO format YYYY-MM-DD
time TEXT, -- e.g., "9am-5pm", "11am-close"
proof_required TEXT, -- e.g., "Illinois ID required", "ZIP code verification"
notes TEXT, -- Additional info (holidays, registration required, etc.)
source TEXT NOT NULL, -- Source identifier (e.g., "choose-chicago")
source_url TEXT, -- Where this data was scraped from
scraped_at TEXT NOT NULL, -- ISO timestamp of scrape
UNIQUE(institution_key, date) -- Prevent duplicate entries per institution/date
);
CREATE INDEX idx_scraped_date ON scraped_free_days(date);
CREATE INDEX idx_scraped_institution_key ON scraped_free_days(institution_key);
CREATE INDEX idx_scraped_source ON scraped_free_days(source);
Manual corrections for when scrapers fail or data needs fixing. Can also suppress incorrect scraped data.
CREATE TABLE IF NOT EXISTS free_days_overrides (
id INTEGER PRIMARY KEY AUTOINCREMENT,
institution TEXT NOT NULL,
institution_key TEXT NOT NULL, -- Slugified institution name for matching
date TEXT NOT NULL, -- ISO format YYYY-MM-DD
time TEXT,
proof_required TEXT,
notes TEXT,
is_cancelled INTEGER DEFAULT 0, -- Set to 1 to suppress a scraped entry
override_reason TEXT, -- Why this override exists
created_at TEXT NOT NULL, -- When override was added
UNIQUE(institution_key, date)
);
CREATE INDEX idx_override_date ON free_days_overrides(date);
CREATE INDEX idx_override_institution_key ON free_days_overrides(institution_key);
When querying for free days, overrides take precedence over scraped data. Use a LEFT JOIN anti-pattern for deterministic results (the previous GROUP BY approach was non-deterministic in SQLite):
-- Get free days for a date range (overrides win, cancelled entries excluded)
SELECT s.institution, s.date, s.time, s.proof_required, s.notes
FROM scraped_free_days s
LEFT JOIN free_days_overrides o
ON o.institution_key = s.institution_key AND o.date = s.date
WHERE s.date BETWEEN ? AND ?
AND o.id IS NULL -- No override exists for this entry
UNION ALL
SELECT o.institution, o.date, o.time, o.proof_required, o.notes
FROM free_days_overrides o
WHERE o.date BETWEEN ? AND ?
AND COALESCE(o.is_cancelled, 0) = 0 -- Exclude cancelled entries
ORDER BY date, institution;
Create the db val with:
Functions:
initDatabase()- Create tables and indexesloadToSQLite(data, source)- Insert scraped data (deletes old entries from same source first)getFreeDays(startDate, endDate)- Query with override logicaddOverride(institution, date, details)- Manual correction helpergetUpcomingWeek()- Helper for bot (next 7 days from today)getLatestScrapeTime()- Get timestamp of most recent scrape (for staleness check)slugify(name)- Generate institution_key from institution namechicagoDate(date?)- Get ISO date string in America/Chicago timezone
Implementation:
import { sqlite } from "https://esm.town/v/std/sqlite";
// Get ISO date string in Chicago timezone (avoids UTC date offset issues)
export function chicagoDate(d = new Date()): string {
const parts = new Intl.DateTimeFormat("en-CA", {
timeZone: "America/Chicago",
year: "numeric",
month: "2-digit",
day: "2-digit",
}).formatToParts(d);
const y = parts.find((p) => p.type === "year")!.value;
const m = parts.find((p) => p.type === "month")!.value;
const day = parts.find((p) => p.type === "day")!.value;
return `${y}-${m}-${day}`;
}
// Generate a stable key for matching institutions across sources/overrides
export function slugify(name: string): string {
return name
.toLowerCase()
.replace(/[^a-z0-9]+/g, "-")
.replace(/^-|-$/g, "");
}
// Initialize tables
export async function initDatabase() {
// Create scraped_free_days table
// Create free_days_overrides table
// Create indexes
}
// Load scraped data (replaces old scraped data from same source)
export async function loadToSQLite(data: Array<FreeDayEntry>, source: string) {
// Use sqlite.batch() for atomic transaction
await sqlite.batch([
{ sql: "DELETE FROM scraped_free_days WHERE source = ?", args: [source] },
...data.map((entry) => ({
sql: `INSERT INTO scraped_free_days
(institution, institution_key, date, time, proof_required, notes, source, source_url, scraped_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`,
args: [
entry.institution,
slugify(entry.institution),
entry.date,
entry.time,
entry.proof_required,
entry.notes,
source,
entry.source_url,
new Date().toISOString(),
],
})),
]);
}
// Query with override logic
export async function getFreeDays(startDate: string, endDate: string) {
// LEFT JOIN anti-pattern query as shown above
}
// Helper for bot (uses Chicago timezone)
export async function getUpcomingWeek() {
const today = chicagoDate();
const nextWeekDate = new Date();
nextWeekDate.setDate(nextWeekDate.getDate() + 7);
const nextWeek = chicagoDate(nextWeekDate);
return getFreeDays(today, nextWeek);
}
// Check data freshness for bot staleness warning
export async function getLatestScrapeTime(): Promise<string | null> {
const result = await sqlite.execute(
"SELECT MAX(scraped_at) as latest FROM scraped_free_days",
);
return result.rows[0]?.latest ?? null;
}
Dependencies:
cheerio(HTML parsing)
Function signature:
export interface RawFreeDayData {
institution: string;
rawLines: string[]; // All lines from HTML (dates AND eligibility notes mixed)
url?: string;
}
export async function scrapeChooseChicago(): Promise<RawFreeDayData[]>;
Scraping logic:
- Fetch HTML from Choose Chicago (with timeout and User-Agent)
- Parse structure:
- Find H3 headings (institution names)
- Extract following
<ul><li>lists (all text, not just dates)
- Return structured data
Important: Choose Chicago mixes date info and eligibility notes in the same <li> list. The normalizer will classify each line.
Example HTML structure to parse:
<h3>Art Institute of Chicago</h3> <ul> <li>January 5 - February 28 (weekdays only, 11am-close)</li> <li>Illinois residents with ID</li> </ul>
Pseudocode:
const SOURCE_URL =
"https://www.choosechicago.com/articles/museums-art/free-museum-days-in-chicago/";
// Fetch with timeout and polite User-Agent
const controller = new AbortController();
setTimeout(() => controller.abort(), 15_000);
const response = await fetch(SOURCE_URL, {
signal: controller.signal,
headers: { "User-Agent": "chicago-free-museum-days-bot/1.0" },
});
const html = await response.text();
const $ = cheerio.load(html);
const results = [];
$("h3").each((i, el) => {
const institution = $(el).text().trim();
const lines = [];
// Find following ul/li elements (contains dates AND eligibility notes)
$(el)
.nextUntil("h3", "ul")
.find("li")
.each((j, li) => {
lines.push($(li).text().trim());
});
results.push({ institution, rawLines: lines, url: SOURCE_URL });
});
return results;
Dependencies:
chrono-node(natural language date parsing)
Functions:
classifyLine(line)- Determine if a line is date-ish or note-ishnormalizeDates(rawData)- Parse raw strings into structured datesexpandPattern(pattern, startDate, endDate)- Expand "every Wednesday" into datesparseTimeRange(timeStr)- Normalize time formats
Key logic:
Choose Chicago mixes date info and eligibility notes in the same <li> list. The normalizer uses a two-pass approach:
- Classify each line as date-ish or note-ish
- Aggregate note-ish lines into
proof_required/notesfor the institution - Expand date-ish lines into individual
FreeDayEntryrecords
export interface FreeDayEntry {
institution: string;
date: string; // ISO YYYY-MM-DD
time?: string;
proof_required?: string;
notes?: string;
source_url?: string;
}
// Patterns that indicate eligibility/requirements rather than dates
const NOTE_PATTERNS = [
/illinois/i,
/resident/i,
/\bID\b/,
/required/i,
/must\b/i,
/free for/i,
/tickets/i,
/registration/i,
/zip code/i,
/proof/i,
];
// Patterns that indicate date information
const DATE_PATTERNS = [
/january|february|march|april|may|june|july|august|september|october|november|december/i,
/\d{1,2}(st|nd|rd|th)?/,
/every\s+(monday|tuesday|wednesday|thursday|friday|saturday|sunday)/i,
/weekdays?/i,
/through|to|-/,
];
function classifyLine(line: string): "date" | "note" {
const noteScore = NOTE_PATTERNS.filter((p) => p.test(line)).length;
const dateScore = DATE_PATTERNS.filter((p) => p.test(line)).length;
// If it looks more like a note, classify as note
if (noteScore > dateScore) return "note";
// If it has date indicators, classify as date
if (dateScore > 0) return "date";
// Default to note (safer to not generate bad date entries)
return "note";
}
export async function normalizeDates(
rawData: RawFreeDayData[],
): Promise<FreeDayEntry[]> {
const results: FreeDayEntry[] = [];
for (const item of rawData) {
// Pass 1: Classify all lines
const dateLines: string[] = [];
const noteLines: string[] = [];
for (const line of item.rawLines) {
if (classifyLine(line) === "date") {
dateLines.push(line);
} else {
noteLines.push(line);
}
}
// Aggregate notes for this institution
const aggregatedNotes = noteLines.join("; ") || undefined;
// Pass 2: Parse date lines
for (const dateLine of dateLines) {
// Try chrono-node parsing
const parsed = chrono.parse(dateLine);
if (parsed.length > 0) {
// Extract date range if present, expand into individual dates
// Add aggregatedNotes to each entry
} else if (/every/i.test(dateLine)) {
// Handle recurring patterns
const expanded = expandPattern(dateLine, item.institution, aggregatedNotes);
results.push(...expanded);
} else {
// Log unparseable entry for manual review
console.warn("Could not parse date line:", dateLine);
}
}
}
return results;
}
// Expand "every Wednesday" into next 12 months of Wednesdays
function expandPattern(
pattern: string,
institution: string,
notes?: string,
): FreeDayEntry[] {
// Parse day of week
// Generate dates for next 12 months
// Return array of entries with notes attached
}
Schedule: Monthly (first Monday of each month)
Logic:
import { scrapeChooseChicago } from "./scrapeChooseChicago";
import { normalizeDates } from "./utils";
import { loadToSQLite } from "./db";
export default async function () {
try {
// Run all scrapers
const rawData = await scrapeChooseChicago();
// Normalize dates
const normalized = await normalizeDates(rawData);
// Load to SQLite
await loadToSQLite(normalized, "choose-chicago");
console.log(`Scraped ${normalized.length} free day entries`);
} catch (error) {
console.error("Scraper failed:", error);
// Optional: send alert/email
}
}
Future expansion:
- Add more scraper functions (Loop Chicago, individual museums)
- Call them all in sequence
- Each writes to same
scraped_free_daystable with differentsource_url
Schedule: Every Monday at 8am Central Time
Note: Val Town cron schedules may run in UTC. To run at 8am Central:
- Standard Time (CST, UTC-6): Use
0 14 * * 1(14:00 UTC) - Daylight Time (CDT, UTC-5): Use
0 13 * * 1(13:00 UTC) - Or check Val Town docs for timezone configuration options
Dependencies:
@atproto/api(Bluesky SDK)
Logic:
import { BskyAgent } from "@atproto/api";
import { getUpcomingWeek, getLatestScrapeTime } from "./db";
const agent = new BskyAgent({ service: "https://bsky.social" });
export default async function () {
// Login to Bluesky
await agent.login({
identifier: Deno.env.get("BLUESKY_HANDLE"),
password: Deno.env.get("BLUESKY_PASSWORD"),
});
// Get upcoming free days
const freeDays = await getUpcomingWeek();
// Guard: Don't post if no data available
if (freeDays.length === 0) {
console.warn("No free days found for upcoming week - skipping post");
return;
}
// Optional: Check data freshness
const lastScrape = await getLatestScrapeTime();
const daysSinceUpdate = lastScrape
? Math.floor((Date.now() - new Date(lastScrape).getTime()) / (1000 * 60 * 60 * 24))
: null;
// Format message
const message = formatMessage(freeDays, daysSinceUpdate);
// Post to Bluesky
await agent.post({ text: message });
}
function formatMessage(freeDays: FreeDayEntry[], daysSinceUpdate?: number | null): string {
// Group by institution
// Format as readable list
// Add footer with data source link
let footer = "Data: github.com/your-repo";
if (daysSinceUpdate && daysSinceUpdate > 14) {
footer = `⚠️ Data last updated ${daysSinceUpdate} days ago\n` + footer;
}
return `
🏛️ Free museums in Chicago this week:
• Art Institute: Mon-Fri (weekdays, 11am-close)
• Field Museum: Wednesday
• MCA: Tuesday evenings 5-9pm
+${freeDays.length - 3} more → [link to API]
${footer}
`.trim();
}
Post format considerations:
- Keep under 300 characters for main post
- Use thread if >5 museums
- Link to full calendar/API
- Emoji for visual interest
Dependencies:
hono(lightweight web framework)
Routes:
import { Hono } from 'hono';
import { getFreeDays, getUpcomingWeek } from './db';
const app = new Hono();
// Get free days for date range
app.get('/free-days', async (c) => {
const start = c.req.query('start'); // YYYY-MM-DD
const end = c.req.query('end'); // YYYY-MM-DD
if (!start || !end) {
return c.json({ error: 'start and end dates required' }, 400);
}
const data = await getFreeDays(start, end);
return c.json({ data, count: data.length });
});
// Get upcoming week (convenience endpoint)
app.get('/upcoming', async (c) => {
const data = await getUpcomingWeek();
return c.json({ data, count: data.length });
});
// Get all institutions (for filtering)
app.get('/institutions', async (c) => {
// Query distinct institutions from both tables
return c.json({ institutions: [...] });
});
export default app.fetch;
Future additions:
/icsendpoint to generate iCal feed/institution/:nameto filter by museum/todayfor current day only
Set these in Val Town secrets:
BLUESKY_HANDLE=your-handle.bsky.social
BLUESKY_PASSWORD=your-app-password
- Create
dbval - Run
initDatabase() - Manually insert test data
- Verify queries work
- Run
scrapeChooseChicago()manually - Inspect raw output
- Pass to
normalizeDates() - Verify date parsing quality
- Check for unparseable entries (aim for <20% failure rate)
- Run full scraper → normalizer → loader pipeline
- Query database, verify data looks correct
- Add manual overrides for failed parsing
- Re-query, confirm overrides work
- Test Bluesky post with hardcoded message
- Test with real database query
- Verify formatting looks good on Bluesky
- Enable cron schedule
- Test each endpoint locally
- Verify JSON responses
- Test date range edge cases
- Deploy and test public URL
- ✅ Scraper runs monthly without errors
- ✅ Database contains ≥15 Chicago institutions
- ✅ Bot posts every Monday at 8am
- ✅ API returns valid JSON for date ranges
- ✅ Manual intervention <1 hour/month
- Add Loop Chicago scraper
- Add individual museum scrapers for missing institutions
- Implement ICS calendar feed endpoint
- Support filtering by neighborhood/museum type
- Community contributions (PR process for corrections)
| Limitation | Impact | Mitigation |
|---|---|---|
| Choose Chicago updates irregularly | Stale data | Run scraper monthly, add manual overrides |
| Date parsing ~80% accurate | Missing/incorrect dates | Manual review process, improve parser over time |
| Proof-of-residency rules change | Incorrect info | Version in git, add notes field |
| Bluesky API rate limits | Posting failures | Respect rate limits, add retry logic |
| SQLite size limits | Unlikely (450 rows/year) | Monitor, could archive old dates if needed |
Monthly:
- Review scraper logs for failures
- Check for unparseable dates
- Add manual overrides as needed
Quarterly:
- Audit data quality against official museum sites
- Update parser for new date formats
- Review Bluesky engagement metrics
Annually:
- Update recurring patterns (if museums change schedules)
- Archive old dates (optional)
- Expand to new data sources if successful
- Multi-city support: Expand to other cities with free museum programs
- ICS calendar feed: Users can subscribe in Google/Apple Calendar
- Embeddable widget: Iframe for community sites
- SMS notifications: Text alerts for favorite museums
- Accessibility filters: Filter by wheelchair access, sensory-friendly hours
- Community contributions: GitHub repo for data corrections/additions
- ❌ Don't pre-optimize date parsing - start simple, improve as needed
- ❌ Don't build complex admin UI - Val Town console + SQLite tool is enough
- ❌ Don't add authentication to API - keep it open (rate limit if abused)
- ❌ Don't overthink schema - simple tables work fine at this scale
Start with Step 1: Database Setup - create the db val and initialize the schema. Once tables exist, you can test the entire pipeline with manual data before building the scraper.