A Bluesky bot that posts weekly updates about free museum days in Chicago, built on Val Town with SQLite storage, automated scraping, and a public API.
Vals:
├── scrapeChooseChicago (function - extracts data from Choose Chicago)
├── scraper (cron - orchestrates all scraper functions)
├── utils (function - normalizeDates, date expansion helpers)
├── db (function - loadToSQLite, query functions)
├── bot (cron - postToBluesky, runs Monday mornings)
└── api (HTTP - Hono API for data access)
SQLite Database:
├── scraped_free_days (auto-populated by scrapers)
├── free_days_overrides (manual corrections/additions)
└── Union query merges both (overrides win on conflicts)
Choose Chicago (HTML)
↓
scrapeChooseChicago() → [{name, rawDates[], url, notes}]
↓
normalizeDates() → [{institution, date, time, proof_required, notes}]
↓
loadToSQLite() → scraped_free_days table
↓
Query (union with overrides)
↓
[Branch 1] postToBluesky() → Bluesky post
[Branch 2] API endpoint → JSON response
Auto-populated by scrapers. Replaced on each scraper run.
CREATE TABLE IF NOT EXISTS scraped_free_days (
id INTEGER PRIMARY KEY AUTOINCREMENT,
institution TEXT NOT NULL,
institution_key TEXT NOT NULL, -- Slugified institution name for matching
date TEXT NOT NULL, -- ISO format YYYY-MM-DD
time TEXT, -- e.g., "9am-5pm", "11am-close"
proof_required TEXT, -- e.g., "Illinois ID required", "ZIP code verification"
notes TEXT, -- Additional info (holidays, registration required, etc.)
source TEXT NOT NULL, -- Source identifier (e.g., "choose-chicago")
source_url TEXT, -- Where this data was scraped from
scraped_at TEXT NOT NULL, -- ISO timestamp of scrape
UNIQUE(institution_key, date) -- Prevent duplicate entries per institution/date
);
CREATE INDEX idx_scraped_date ON scraped_free_days(date);
CREATE INDEX idx_scraped_institution_key ON scraped_free_days(institution_key);
CREATE INDEX idx_scraped_source ON scraped_free_days(source);
Manual corrections for when scrapers fail or data needs fixing. Can also suppress incorrect scraped data.
CREATE TABLE IF NOT EXISTS free_days_overrides (
id INTEGER PRIMARY KEY AUTOINCREMENT,
institution TEXT NOT NULL,
institution_key TEXT NOT NULL, -- Slugified institution name for matching
date TEXT NOT NULL, -- ISO format YYYY-MM-DD
time TEXT,
proof_required TEXT,
notes TEXT,
is_cancelled INTEGER DEFAULT 0, -- Set to 1 to suppress a scraped entry
override_reason TEXT, -- Why this override exists
created_at TEXT NOT NULL, -- When override was added
UNIQUE(institution_key, date)
);
CREATE INDEX idx_override_date ON free_days_overrides(date);
CREATE INDEX idx_override_institution_key ON free_days_overrides(institution_key);
When querying for free days, overrides take precedence over scraped data. Use a LEFT JOIN anti-pattern for deterministic results (the previous GROUP BY approach was non-deterministic in SQLite):
-- Get free days for a date range (overrides win, cancelled entries excluded)
SELECT s.institution, s.date, s.time, s.proof_required, s.notes
FROM scraped_free_days s
LEFT JOIN free_days_overrides o
ON o.institution_key = s.institution_key AND o.date = s.date
WHERE s.date BETWEEN ? AND ?
AND o.id IS NULL -- No override exists for this entry
UNION ALL
SELECT o.institution, o.date, o.time, o.proof_required, o.notes
FROM free_days_overrides o
WHERE o.date BETWEEN ? AND ?
AND COALESCE(o.is_cancelled, 0) = 0 -- Exclude cancelled entries
ORDER BY date, institution;
Create the db val with:
Functions:
initDatabase()- Create tables and indexesloadToSQLite(data, source)- Insert scraped data (deletes old entries from same source first)getFreeDays(startDate, endDate)- Query with override logicaddOverride(institution, date, details)- Manual correction helpergetUpcomingWeek()- Helper for bot (next 7 days from today)slugify(name)- Generate institution_key from institution namechicagoDate(date?)- Get ISO date string in America/Chicago timezone
Implementation:
import { sqlite } from "https://esm.town/v/std/sqlite";
// Get ISO date string in Chicago timezone (avoids UTC date offset issues)
export function chicagoDate(d = new Date()): string {
const parts = new Intl.DateTimeFormat("en-CA", {
timeZone: "America/Chicago",
year: "numeric",
month: "2-digit",
day: "2-digit",
}).formatToParts(d);
const y = parts.find((p) => p.type === "year")!.value;
const m = parts.find((p) => p.type === "month")!.value;
const day = parts.find((p) => p.type === "day")!.value;
return `${y}-${m}-${day}`;
}
// Generate a stable key for matching institutions across sources/overrides
export function slugify(name: string): string {
return name
.toLowerCase()
.replace(/[^a-z0-9]+/g, "-")
.replace(/^-|-$/g, "");
}
// Initialize tables
export async function initDatabase() {
// Create scraped_free_days table
// Create free_days_overrides table
// Create indexes
}
// Load scraped data (replaces old scraped data from same source)
export async function loadToSQLite(data: Array<FreeDayEntry>, source: string) {
// Use sqlite.batch() for atomic transaction
await sqlite.batch([
{ sql: "DELETE FROM scraped_free_days WHERE source = ?", args: [source] },
...data.map((entry) => ({
sql: `INSERT INTO scraped_free_days
(institution, institution_key, date, time, proof_required, notes, source, source_url, scraped_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`,
args: [
entry.institution,
slugify(entry.institution),
entry.date,
entry.time,
entry.proof_required,
entry.notes,
source,
entry.source_url,
new Date().toISOString(),
],
})),
]);
}
// Query with override logic
export async function getFreeDays(startDate: string, endDate: string) {
// LEFT JOIN anti-pattern query as shown above
}
// Helper for bot (uses Chicago timezone)
export async function getUpcomingWeek() {
const today = chicagoDate();
const nextWeekDate = new Date();
nextWeekDate.setDate(nextWeekDate.getDate() + 7);
const nextWeek = chicagoDate(nextWeekDate);
return getFreeDays(today, nextWeek);
}
Dependencies:
cheerio(HTML parsing)
Function signature:
export interface RawFreeDayData {
institution: string;
rawLines: string[]; // All lines from HTML (dates AND eligibility notes mixed)
url?: string;
}
export async function scrapeChooseChicago(): Promise<RawFreeDayData[]>;
Scraping logic:
- Fetch HTML from Choose Chicago (with timeout and User-Agent)
- Parse structure:
- Find H3 headings (institution names)
- Extract following
<ul><li>lists (all text, not just dates)
- Return structured data
Important: Choose Chicago mixes date info and eligibility notes in the same <li> list. The normalizer will classify each line.
Example HTML structure to parse:
<h3>Art Institute of Chicago</h3> <ul> <li>January 5 - February 28 (weekdays only, 11am-close)</li> <li>Illinois residents with ID</li> </ul>
Pseudocode:
const $ = cheerio.load(html);
const results = [];
$("h3").each((i, el) => {
const institution = $(el).text().trim();
const dates = [];
// Find following ul/li elements
$(el)
.nextUntil("h3", "ul")
.find("li")
.each((j, li) => {
dates.push($(li).text().trim());
});
results.push({ institution, rawDates: dates });
});
return results;
Dependencies:
chrono-node(natural language date parsing)
Functions:
normalizeDates(rawData)- Parse raw strings into structured datesexpandPattern(pattern, startDate, endDate)- Expand "every Wednesday" into datesparseTimeRange(timeStr)- Normalize time formats
Key logic:
export interface FreeDayEntry {
institution: string;
date: string; // ISO YYYY-MM-DD
time?: string;
proof_required?: string;
notes?: string;
}
export async function normalizeDates(
rawData: RawFreeDayData[],
): Promise<FreeDayEntry[]> {
const results: FreeDayEntry[] = [];
for (const item of rawData) {
for (const rawDate of item.rawDates) {
// Try chrono-node parsing
const parsed = chrono.parse(rawDate);
if (parsed.length > 0) {
// Extract date range if present
// Expand into individual dates
// Add to results
} else if (rawDate.includes("every")) {
// Handle recurring patterns
const expanded = expandPattern(rawDate);
results.push(...expanded);
} else {
// Log unparseable entry for manual review
console.warn("Could not parse:", rawDate);
}
}
}
return results;
}
// Expand "every Wednesday" into next 12 months of Wednesdays
function expandPattern(pattern: string): FreeDayEntry[] {
// Parse day of week
// Generate dates for next 12 months
// Return array of entries
}
Schedule: Monthly (first Monday of each month)
Logic:
import { scrapeChooseChicago } from "./scrapeChooseChicago";
import { normalizeDates } from "./utils";
import { loadToSQLite } from "./db";
export default async function () {
try {
// Run all scrapers
const rawData = await scrapeChooseChicago();
// Normalize dates
const normalized = await normalizeDates(rawData);
// Load to SQLite
await loadToSQLite(normalized, "choose-chicago");
console.log(`Scraped ${normalized.length} free day entries`);
} catch (error) {
console.error("Scraper failed:", error);
// Optional: send alert/email
}
}
Future expansion:
- Add more scraper functions (Loop Chicago, individual museums)
- Call them all in sequence
- Each writes to same
scraped_free_daystable with differentsource_url
Schedule: Every Monday at 8am Central Time
Dependencies:
@atproto/api(Bluesky SDK)
Logic:
import { BskyAgent } from "@atproto/api";
import { getUpcomingWeek } from "./db";
const agent = new BskyAgent({ service: "https://bsky.social" });
export default async function () {
// Login to Bluesky
await agent.login({
identifier: process.env.BLUESKY_HANDLE,
password: process.env.BLUESKY_PASSWORD,
});
// Get upcoming free days
const freeDays = await getUpcomingWeek();
// Format message
const message = formatMessage(freeDays);
// Post to Bluesky
await agent.post({ text: message });
}
function formatMessage(freeDays: FreeDayEntry[]): string {
// Group by institution
// Format as readable list
// Add footer with data source link
return `
🏛️ Free museums in Chicago this week:
• Art Institute: Mon-Fri (weekdays, 11am-close)
• Field Museum: Wednesday
• MCA: Tuesday evenings 5-9pm
+${freeDays.length - 3} more → [link to API]
Data: github.com/your-repo
`.trim();
}
Post format considerations:
- Keep under 300 characters for main post
- Use thread if >5 museums
- Link to full calendar/API
- Emoji for visual interest
Dependencies:
hono(lightweight web framework)
Routes:
import { Hono } from 'hono';
import { getFreeDays, getUpcomingWeek } from './db';
const app = new Hono();
// Get free days for date range
app.get('/free-days', async (c) => {
const start = c.req.query('start'); // YYYY-MM-DD
const end = c.req.query('end'); // YYYY-MM-DD
if (!start || !end) {
return c.json({ error: 'start and end dates required' }, 400);
}
const data = await getFreeDays(start, end);
return c.json({ data, count: data.length });
});
// Get upcoming week (convenience endpoint)
app.get('/upcoming', async (c) => {
const data = await getUpcomingWeek();
return c.json({ data, count: data.length });
});
// Get all institutions (for filtering)
app.get('/institutions', async (c) => {
// Query distinct institutions from both tables
return c.json({ institutions: [...] });
});
export default app.fetch;
Future additions:
/icsendpoint to generate iCal feed/institution/:nameto filter by museum/todayfor current day only
Set these in Val Town secrets:
BLUESKY_HANDLE=your-handle.bsky.social
BLUESKY_PASSWORD=your-app-password
- Create
dbval - Run
initDatabase() - Manually insert test data
- Verify queries work
- Run
scrapeChooseChicago()manually - Inspect raw output
- Pass to
normalizeDates() - Verify date parsing quality
- Check for unparseable entries (aim for <20% failure rate)
- Run full scraper → normalizer → loader pipeline
- Query database, verify data looks correct
- Add manual overrides for failed parsing
- Re-query, confirm overrides work
- Test Bluesky post with hardcoded message
- Test with real database query
- Verify formatting looks good on Bluesky
- Enable cron schedule
- Test each endpoint locally
- Verify JSON responses
- Test date range edge cases
- Deploy and test public URL
- ✅ Scraper runs monthly without errors
- ✅ Database contains ≥15 Chicago institutions
- ✅ Bot posts every Monday at 8am
- ✅ API returns valid JSON for date ranges
- ✅ Manual intervention <1 hour/month
- Add Loop Chicago scraper
- Add individual museum scrapers for missing institutions
- Implement ICS calendar feed endpoint
- Support filtering by neighborhood/museum type
- Community contributions (PR process for corrections)
| Limitation | Impact | Mitigation |
|---|---|---|
| Choose Chicago updates irregularly | Stale data | Run scraper monthly, add manual overrides |
| Date parsing ~80% accurate | Missing/incorrect dates | Manual review process, improve parser over time |
| Proof-of-residency rules change | Incorrect info | Version in git, add notes field |
| Bluesky API rate limits | Posting failures | Respect rate limits, add retry logic |
| SQLite size limits | Unlikely (450 rows/year) | Monitor, could archive old dates if needed |
Monthly:
- Review scraper logs for failures
- Check for unparseable dates
- Add manual overrides as needed
Quarterly:
- Audit data quality against official museum sites
- Update parser for new date formats
- Review Bluesky engagement metrics
Annually:
- Update recurring patterns (if museums change schedules)
- Archive old dates (optional)
- Expand to new data sources if successful
- Multi-city support: Expand to other cities with free museum programs
- ICS calendar feed: Users can subscribe in Google/Apple Calendar
- Embeddable widget: Iframe for community sites
- SMS notifications: Text alerts for favorite museums
- Accessibility filters: Filter by wheelchair access, sensory-friendly hours
- Community contributions: GitHub repo for data corrections/additions
- ❌ Don't pre-optimize date parsing - start simple, improve as needed
- ❌ Don't build complex admin UI - Val Town console + SQLite tool is enough
- ❌ Don't add authentication to API - keep it open (rate limit if abused)
- ❌ Don't overthink schema - simple tables work fine at this scale
Start with Step 1: Database Setup - create the db val and initialize the schema. Once tables exist, you can test the entire pipeline with manual data before building the scraper.