A Bluesky bot that posts weekly updates about free museum days in Chicago, built on Val Town with SQLite storage, automated scraping, and a public API.
Vals:
├── scrapeChooseChicago (function - extracts data from Choose Chicago)
├── scraper (cron - orchestrates all scraper functions)
├── utils (function - normalizeDates, date expansion helpers)
├── db (function - loadToSQLite, query functions)
├── bot (cron - postToBluesky, runs Monday mornings)
└── api (HTTP - Hono API for data access)
SQLite Database:
├── scraped_free_days (auto-populated by scrapers)
├── free_days_overrides (manual corrections/additions)
└── Union query merges both (overrides win on conflicts)
Choose Chicago (HTML)
↓
scrapeChooseChicago() → [{name, rawDates[], url, notes}]
↓
normalizeDates() → [{institution, date, time, proof_required, notes}]
↓
loadToSQLite() → scraped_free_days table
↓
Query (union with overrides)
↓
[Branch 1] postToBluesky() → Bluesky post
[Branch 2] API endpoint → JSON response
Auto-populated by scrapers. Replaced on each scraper run.
CREATE TABLE IF NOT EXISTS scraped_free_days (
id INTEGER PRIMARY KEY AUTOINCREMENT,
institution TEXT NOT NULL,
date TEXT NOT NULL, -- ISO format YYYY-MM-DD
time TEXT, -- e.g., "9am-5pm", "11am-close"
proof_required TEXT, -- e.g., "Illinois ID required", "ZIP code verification"
notes TEXT, -- Additional info (holidays, registration required, etc.)
source_url TEXT, -- Where this data was scraped from
scraped_at TEXT NOT NULL, -- ISO timestamp of scrape
UNIQUE(institution, date) -- Prevent duplicate entries
);
CREATE INDEX idx_date ON scraped_free_days(date);
CREATE INDEX idx_institution ON scraped_free_days(institution);
Manual corrections for when scrapers fail or data needs fixing.
CREATE TABLE IF NOT EXISTS free_days_overrides (
id INTEGER PRIMARY KEY AUTOINCREMENT,
institution TEXT NOT NULL,
date TEXT NOT NULL, -- ISO format YYYY-MM-DD
time TEXT,
proof_required TEXT,
notes TEXT,
override_reason TEXT, -- Why this override exists
created_at TEXT NOT NULL, -- When override was added
UNIQUE(institution, date)
);
CREATE INDEX idx_override_date ON free_days_overrides(date);
When querying for free days, always union both tables with overrides taking precedence:
-- Get free days for a date range
SELECT institution, date, time, proof_required, notes
FROM (
SELECT institution, date, time, proof_required, notes, 1 as priority
FROM free_days_overrides
WHERE date BETWEEN ? AND ?
UNION ALL
SELECT institution, date, time, proof_required, notes, 2 as priority
FROM scraped_free_days
WHERE date BETWEEN ? AND ?
)
GROUP BY institution, date
HAVING priority = MIN(priority)
ORDER BY date, institution;
Create the db val with:
Functions:
initDatabase()- Create tables and indexesloadToSQLite(data, source)- Insert scraped datagetFreeDays(startDate, endDate)- Query with override logicaddOverride(institution, date, details)- Manual correction helpergetUpcomingWeek()- Helper for bot (next 7 days from today)
Implementation:
import { sqlite } from "https://esm.town/v/std/sqlite";
// Initialize tables
export async function initDatabase() {
// Create scraped_free_days table
// Create free_days_overrides table
// Create indexes
}
// Load scraped data (replaces old scraped data)
export async function loadToSQLite(data: Array<FreeDayEntry>, source: string) {
// Begin transaction
// Delete old entries from same source
// Insert new entries
// Commit
}
// Query with override logic
export async function getFreeDays(startDate: string, endDate: string) {
// Union query as shown above
}
// Helper for bot
export async function getUpcomingWeek() {
const today = new Date().toISOString().split('T')[0];
const nextWeek = new Date(Date.now() + 7 * 24 * 60 * 60 * 1000)
.toISOString().split('T')[0];
return getFreeDays(today, nextWeek);
}
Dependencies:
cheerio(HTML parsing)
Function signature:
export interface RawFreeDayData {
institution: string;
rawDates: string[]; // Unparsed date strings from HTML
url?: string;
notes?: string;
}
export async function scrapeChooseChicago(): Promise<RawFreeDayData[]>
Scraping logic:
- Fetch HTML from Choose Chicago
- Parse structure:
- Find H3 headings (institution names)
- Extract following
<ul><li>lists (date strings)
- Return structured data
Example HTML structure to parse:
<h3>Art Institute of Chicago</h3> <ul> <li>January 5 - February 28 (weekdays only, 11am-close)</li> <li>Illinois residents with ID</li> </ul>
Pseudocode:
const $ = cheerio.load(html);
const results = [];
$('h3').each((i, el) => {
const institution = $(el).text().trim();
const dates = [];
// Find following ul/li elements
$(el).nextUntil('h3', 'ul').find('li').each((j, li) => {
dates.push($(li).text().trim());
});
results.push({ institution, rawDates: dates });
});
return results;
Dependencies:
chrono-node(natural language date parsing)
Functions:
normalizeDates(rawData)- Parse raw strings into structured datesexpandPattern(pattern, startDate, endDate)- Expand "every Wednesday" into datesparseTimeRange(timeStr)- Normalize time formats
Key logic:
export interface FreeDayEntry {
institution: string;
date: string; // ISO YYYY-MM-DD
time?: string;
proof_required?: string;
notes?: string;
}
export async function normalizeDates(
rawData: RawFreeDayData[]
): Promise<FreeDayEntry[]> {
const results: FreeDayEntry[] = [];
for (const item of rawData) {
for (const rawDate of item.rawDates) {
// Try chrono-node parsing
const parsed = chrono.parse(rawDate);
if (parsed.length > 0) {
// Extract date range if present
// Expand into individual dates
// Add to results
} else if (rawDate.includes('every')) {
// Handle recurring patterns
const expanded = expandPattern(rawDate);
results.push(...expanded);
} else {
// Log unparseable entry for manual review
console.warn('Could not parse:', rawDate);
}
}
}
return results;
}
// Expand "every Wednesday" into next 12 months of Wednesdays
function expandPattern(pattern: string): FreeDayEntry[] {
// Parse day of week
// Generate dates for next 12 months
// Return array of entries
}
Schedule: Monthly (first Monday of each month)
Logic:
import { scrapeChooseChicago } from "./scrapeChooseChicago";
import { normalizeDates } from "./utils";
import { loadToSQLite } from "./db";
export default async function() {
try {
// Run all scrapers
const rawData = await scrapeChooseChicago();
// Normalize dates
const normalized = await normalizeDates(rawData);
// Load to SQLite
await loadToSQLite(normalized, 'choose-chicago');
console.log(`Scraped ${normalized.length} free day entries`);
} catch (error) {
console.error('Scraper failed:', error);
// Optional: send alert/email
}
}
Future expansion:
- Add more scraper functions (Loop Chicago, individual museums)
- Call them all in sequence
- Each writes to same
scraped_free_daystable with differentsource_url
Schedule: Every Monday at 8am Central Time
Dependencies:
@atproto/api(Bluesky SDK)
Logic:
import { BskyAgent } from '@atproto/api';
import { getUpcomingWeek } from './db';
const agent = new BskyAgent({ service: 'https://bsky.social' });
export default async function() {
// Login to Bluesky
await agent.login({
identifier: process.env.BLUESKY_HANDLE,
password: process.env.BLUESKY_PASSWORD
});
// Get upcoming free days
const freeDays = await getUpcomingWeek();
// Format message
const message = formatMessage(freeDays);
// Post to Bluesky
await agent.post({ text: message });
}
function formatMessage(freeDays: FreeDayEntry[]): string {
// Group by institution
// Format as readable list
// Add footer with data source link
return `
🏛️ Free museums in Chicago this week:
• Art Institute: Mon-Fri (weekdays, 11am-close)
• Field Museum: Wednesday
• MCA: Tuesday evenings 5-9pm
+${freeDays.length - 3} more → [link to API]
Data: github.com/your-repo
`.trim();
}
Post format considerations:
- Keep under 300 characters for main post
- Use thread if >5 museums
- Link to full calendar/API
- Emoji for visual interest
Dependencies:
hono(lightweight web framework)
Routes:
import { Hono } from 'hono';
import { getFreeDays, getUpcomingWeek } from './db';
const app = new Hono();
// Get free days for date range
app.get('/free-days', async (c) => {
const start = c.req.query('start'); // YYYY-MM-DD
const end = c.req.query('end'); // YYYY-MM-DD
if (!start || !end) {
return c.json({ error: 'start and end dates required' }, 400);
}
const data = await getFreeDays(start, end);
return c.json({ data, count: data.length });
});
// Get upcoming week (convenience endpoint)
app.get('/upcoming', async (c) => {
const data = await getUpcomingWeek();
return c.json({ data, count: data.length });
});
// Get all institutions (for filtering)
app.get('/institutions', async (c) => {
// Query distinct institutions from both tables
return c.json({ institutions: [...] });
});
export default app.fetch;
Future additions:
/icsendpoint to generate iCal feed/institution/:nameto filter by museum/todayfor current day only
Set these in Val Town secrets:
BLUESKY_HANDLE=your-handle.bsky.social
BLUESKY_PASSWORD=your-app-password
- Create
dbval - Run
initDatabase() - Manually insert test data
- Verify queries work
- Run
scrapeChooseChicago()manually - Inspect raw output
- Pass to
normalizeDates() - Verify date parsing quality
- Check for unparseable entries (aim for <20% failure rate)
- Run full scraper → normalizer → loader pipeline
- Query database, verify data looks correct
- Add manual overrides for failed parsing
- Re-query, confirm overrides work
- Test Bluesky post with hardcoded message
- Test with real database query
- Verify formatting looks good on Bluesky
- Enable cron schedule
- Test each endpoint locally
- Verify JSON responses
- Test date range edge cases
- Deploy and test public URL
- ✅ Scraper runs monthly without errors
- ✅ Database contains ≥15 Chicago institutions
- ✅ Bot posts every Monday at 8am
- ✅ API returns valid JSON for date ranges
- ✅ Manual intervention <1 hour/month
- Add Loop Chicago scraper
- Add individual museum scrapers for missing institutions
- Implement ICS calendar feed endpoint
- Support filtering by neighborhood/museum type
- Community contributions (PR process for corrections)
| Limitation | Impact | Mitigation |
|---|---|---|
| Choose Chicago updates irregularly | Stale data | Run scraper monthly, add manual overrides |
| Date parsing ~80% accurate | Missing/incorrect dates | Manual review process, improve parser over time |
| Proof-of-residency rules change | Incorrect info | Version in git, add notes field |
| Bluesky API rate limits | Posting failures | Respect rate limits, add retry logic |
| SQLite size limits | Unlikely (450 rows/year) | Monitor, could archive old dates if needed |
Monthly:
- Review scraper logs for failures
- Check for unparseable dates
- Add manual overrides as needed
Quarterly:
- Audit data quality against official museum sites
- Update parser for new date formats
- Review Bluesky engagement metrics
Annually:
- Update recurring patterns (if museums change schedules)
- Archive old dates (optional)
- Expand to new data sources if successful
- Multi-city support: Expand to other cities with free museum programs
- ICS calendar feed: Users can subscribe in Google/Apple Calendar
- Embeddable widget: Iframe for community sites
- SMS notifications: Text alerts for favorite museums
- Accessibility filters: Filter by wheelchair access, sensory-friendly hours
- Community contributions: GitHub repo for data corrections/additions
- ❌ Don't pre-optimize date parsing - start simple, improve as needed
- ❌ Don't build complex admin UI - Val Town console + SQLite tool is enough
- ❌ Don't add authentication to API - keep it open (rate limit if abused)
- ❌ Don't overthink schema - simple tables work fine at this scale
Start with Step 1: Database Setup - create the db val and initialize the schema. Once tables exist, you can test the entire pipeline with manual data before building the scraper.