• Blog
  • Docs
  • Pricing
  • We’re hiring!
Log inSign up
project logo

nmsilva

my-portfolio

Starter template for a markdown blog
Public
Like
my-portfolio
Home
Code
5
docs
4
posts
6
Layout.tsx
README.md
H
index.tsx
Environment variables
Branches
1
Pull requests
Remixes
History
Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data – all from the browser, and deployed in milliseconds.
Sign up now
Code
/
posts
/
football-stats-pipeline.md
Code
/
posts
/
football-stats-pipeline.md
Search
…
football-stats-pipeline.md
slug:
football-stats-pipeline
date:
Dec 5, 2025
readTime_en:
10 min read
readTime_pt:
10 min leitura
title_en:
Building a real-time football stats pipeline with AI enrichment
title_pt:
Pipeline de estatísticas de futebol em tempo real com IA
excerpt_en:
How I wired fbref data → scheduled workers → GPT analysis → live API in a weekend, entirely on serverless infrastructure.
excerpt_pt:
Como liguei dados do fbref → workers agendados → análise GPT → API em direto num fim de semana, tudo em infraestrutura serverless.
tags:
AI, TypeScript, Data Engineering

The goal

I wanted a personal API that could answer questions like: "Which midfielders in the Premier League have the best progressive pass ratio in the last 5 games?" — with fresh data, not a CSV from 2022.

The plan: scrape fbref → process and store → enrich with AI → serve via REST. All on Val.town, zero infrastructure to manage.

Architecture

[fbref scraper cron]
       ↓
[raw events store (Val blob)]
       ↓
[processing worker]  ←  runs every 15 min
       ↓
[AI enrichment]  ←  GPT-4o summaries + tags
       ↓
[HTTP API val]  ←  public REST endpoints

The scraper

fbref doesn't have an official API, so this involved parsing their HTML tables. The tricky part is rate limiting — scrape too fast and you get blocked. I settled on one request per 3 seconds with randomised jitter:

async function fetchWithDelay(url: string, minMs = 2000, jitter = 1000) { const delay = minMs + Math.random() * jitter; await new Promise(r => setTimeout(r, delay)); return fetch(url, { headers: { "User-Agent": "personal-stats-bot/1.0" } }); }

AI enrichment

After storing raw stats, a separate worker calls the OpenAI API to generate:

  • A plain-English summary of a player's recent form
  • Automatic tags (e.g. "in-form", "injury-return", "high-press-specialist")
  • A form score (0–100)

The prompt is simple but the output is surprisingly useful for filtering.

Result

The API now serves ~50 endpoints with sub-100ms response times (mostly from Val blob cache). Total cost: ~$2/month in OpenAI credits. Total infrastructure managed: zero.

The code lives on Val.town at val.town/u/nmsilva — some vals are public if you want to poke around.

---pt---

O objectivo

Queria uma API pessoal que respondesse a perguntas como: "Quais médios da Premier League têm o melhor rácio de passes progressivos nos últimos 5 jogos?" — com dados frescos, não um CSV de 2022.

O plano: fazer scraping do fbref → processar e armazenar → enriquecer com IA → servir via REST. Tudo no Val.town, zero infraestrutura a gerir.

Arquitectura

[cron scraper fbref]
       ↓
[armazenamento de eventos raw (Val blob)]
       ↓
[worker de processamento]  ←  corre a cada 15 min
       ↓
[enriquecimento com IA]  ←  resumos GPT-4o + tags
       ↓
[HTTP API val]  ←  endpoints REST públicos

O scraper

O fbref não tem uma API oficial, por isso isto envolvia fazer parse das tabelas HTML. A parte complicada é o rate limiting — scraping demasiado rápido e ficas bloqueado. Cheguei a um pedido por cada 3 segundos com jitter aleatório:

async function fetchWithDelay(url: string, minMs = 2000, jitter = 1000) { const delay = minMs + Math.random() * jitter; await new Promise(r => setTimeout(r, delay)); return fetch(url, { headers: { "User-Agent": "personal-stats-bot/1.0" } }); }

Enriquecimento com IA

Após armazenar estatísticas raw, um worker separado chama a API da OpenAI para gerar:

  • Um resumo em linguagem simples da forma recente de um jogador
  • Tags automáticas (ex: "em forma", "regresso de lesão", "especialista em pressing")
  • Uma pontuação de forma (0–100)

O prompt é simples mas o output é surpreendentemente útil para filtrar.

Resultado

A API serve agora ~50 endpoints com tempos de resposta abaixo de 100ms (principalmente cache do Val blob). Custo total: ~$2/mês em créditos OpenAI. Infraestrutura gerida: zero.

O código está no Val.town em val.town/u/nmsilva — alguns vals são públicos se quiseres explorar.

FeaturesVersion controlCode intelligenceCLIMCP
Use cases
TeamsAI agentsSlackGTM
DocsShowcaseTemplatesNewestTrendingAPI examplesNPM packages
PricingNewsletterBlogAboutCareers
We’re hiring!
Brandhi@val.townStatus
X (Twitter)
Discord community
GitHub discussions
YouTube channel
Bluesky
Open Source Pledge
Terms of usePrivacy policyAbuse contact
© 2026 Val Town, Inc.