| slug: | football-stats-pipeline |
|---|---|
| date: | Dec 5, 2025 |
| readTime_en: | 10 min read |
| readTime_pt: | 10 min leitura |
| title_en: | Building a real-time football stats pipeline with AI enrichment |
| title_pt: | Pipeline de estatísticas de futebol em tempo real com IA |
| excerpt_en: | How I wired fbref data → scheduled workers → GPT analysis → live API in a weekend, entirely on serverless infrastructure. |
| excerpt_pt: | Como liguei dados do fbref → workers agendados → análise GPT → API em direto num fim de semana, tudo em infraestrutura serverless. |
| tags: | AI, TypeScript, Data Engineering |
I wanted a personal API that could answer questions like: "Which midfielders in the Premier League have the best progressive pass ratio in the last 5 games?" — with fresh data, not a CSV from 2022.
The plan: scrape fbref → process and store → enrich with AI → serve via REST. All on Val.town, zero infrastructure to manage.
``` [fbref scraper cron] ↓ [raw events store (Val blob)] ↓ [processing worker] ← runs every 15 min ↓ [AI enrichment] ← GPT-4o summaries + tags ↓ [HTTP API val] ← public REST endpoints ```
fbref doesn't have an official API, so this involved parsing their HTML tables. The tricky part is rate limiting — scrape too fast and you get blocked. I settled on one request per 3 seconds with randomised jitter:
```ts async function fetchWithDelay(url: string, minMs = 2000, jitter = 1000) { const delay = minMs + Math.random() * jitter; await new Promise(r => setTimeout(r, delay)); return fetch(url, { headers: { "User-Agent": "personal-stats-bot/1.0" } }); } ```
After storing raw stats, a separate worker calls the OpenAI API to generate:
- A plain-English summary of a player's recent form
- Automatic tags (e.g. "in-form", "injury-return", "high-press-specialist")
- A form score (0–100)
The prompt is simple but the output is surprisingly useful for filtering.
The API now serves ~50 endpoints with sub-100ms response times (mostly from Val blob cache). Total cost: ~$2/month in OpenAI credits. Total infrastructure managed: zero.
The code lives on Val.town at val.town/u/nmsilva — some vals are public if you want to poke around.