A Hono API server that fetches, caches, and processes Groq documentation pages with token counting and AI-generated metadata.
- Fetches documentation pages from Groq's console
 - Caches page content, metadata, token counts, and embeddings in SQLite
 - Token counting using tiktoken (GPT-4 encoding)
 - AI-generated metadata (categories, tags, use cases, sample questions)
 - Content embeddings generation (currently fake, ready for Groq API integration)
 - Hash-based change detection to skip unchanged pages during recalculation
 - Rate limiting with async-sema to avoid WAF blocking
 - RESTful API endpoints for accessing pages and managing cache
 - Modular code structure (utils.ts for utilities, groq.ts for Groq API functions)
 
On first run, the cache will be empty. You should populate it by running:
GET /cache/recalculate
This will:
- Fetch all pages from the URLs list
 - Calculate token counts for each page
 - Generate AI metadata (categories, tags, use cases, questions)
 - Generate embeddings for each page
 - Calculate content hashes for change detection
 - Store everything in the SQLite cache
 - Return a summary of what was cached
 
Important: This will take some time as it processes all pages, generates metadata, and calculates tokens for each. Be patient!
Note: On subsequent runs, unchanged pages (detected by content hash) will be automatically skipped unless you use force mode.
Check that the cache was populated:
GET /cache/stats
This returns:
{ "cachedPages": 121, "totalTokens": 1234567 }
You should run /cache/recalculate in these scenarios:
- First time setup - Cache is empty
 - URL list changes - You've added or removed URLs from the 
urlsarray - Content updates - Documentation pages have been updated and you want fresh data
 - Token count needed - You need accurate token counts for new content
 - Metadata refresh - You want to regenerate AI metadata or embeddings
 
By default, /cache/recalculate uses hash-based change detection:
GET /cache/recalculate
Behavior:
- Fetches each page and calculates its content hash (SHA-256)
 - Compares hash with cached version
 - Skips pages with unchanged content (saves time and API calls)
 - Only processes pages that have changed
 - Still generates embeddings and metadata for changed pages
 
Response includes:
processed- Number of pages actually processedskipped- Number of pages skipped (unchanged)force- Alwaysfalsein default mode
To force recalculation of all pages (ignoring hash checks):
GET /cache/recalculate?force=true
Use cases:
- Regenerating all metadata/embeddings even if content unchanged
 - After updating metadata generation prompts
 - When you want to ensure everything is fresh
 
For single page updates, you can use:
GET /cache/clear/:path
This clears the cache for a specific page. The next time that page is requested via /page/:path, it will be fetched fresh and recached.
- Weekly: Run recalculate (default mode) to catch any documentation updates efficiently
 - After major docs changes: Use force mode to regenerate everything
 - When adding new pages: Update the 
urlsarray, then run recalculate 
Get the root docs page (cached if available).
Get a specific page by path. Examples:
/page/api-reference/page/agentic-tooling/compound-beta/page/model/llama-3.1-8b-instant
Response includes:
url- The source URLcontent- Full page content with frontmattercharCount- Character counttokenCount- Token count (calculated with tiktoken)- All frontmatter fields flattened (title, description, image, etc.)
 
Caching: Responses are cached. First request fetches and caches, subsequent requests are instant.
Get a list of all available page paths.
Response:
[ "docs", "agentic-tooling", "api-reference", ... ]
Search pages by query string.
Query Parameters:
q(required) - Search query stringlimit(optional) - Maximum number of results (default: 10)minScore(optional) - Minimum score threshold (default: 0)
Example:
GET /search?q=authentication&limit=5
Response:
{ "query": "authentication", "results": [ { "path": "api-reference", "url": "https://console.groq.com/docs/api-reference.md", "title": "API Reference", "score": 45, "snippet": "...authentication tokens are required for all API requests..." }, { "path": "quickstart", "url": "https://console.groq.com/docs/quickstart.md", "title": "Quick Start", "score": 32, "snippet": "...get your API key for authentication..." } ], "totalResults": 2, "totalPages": 121 }
Search Features:
- Keyword matching in titles and content
 - Metadata boost (tags, categories, use cases)
 - Score-based ranking
 - Content snippets around matches
 - Uses cached pages when available for faster results
 
Note: Currently uses keyword-based search. Future versions will use embeddings for semantic search.
Get metadata for all pages (does not use cache - fetches fresh).
Response:
{ "pages": [ { "url": "...", "charCount": 1234, "frontmatter": {...} } ], "contents": [...], "totalPages": 121, "totalChars": 1234567 }
Get cache statistics.
Response:
{ "cachedPages": 121, "totalTokens": 1234567 }
Clear the entire cache.
Response:
{ "message": "Cache cleared", "success": true }
Clear cache for a specific page.
Example:
GET /cache/clear/api-reference
Response:
{ "message": "Cache cleared for api-reference", "success": true }
Recalculate pages with AI metadata and embeddings generation.
Query Parameters:
force(optional): Set totrueto force recalculation of all pages, ignoring hash checks
Default Mode (no query params):
GET /cache/recalculate
Force Mode:
GET /cache/recalculate?force=true
Response (Default Mode):
{ "message": "Recalculated 5 pages, skipped 116 unchanged pages", "results": [ { "path": "api-reference", "url": "https://console.groq.com/docs/api-reference.md", "charCount": 1234, "tokenCount": 567, "title": "API Reference", "metadata": { "categories": ["API", "Reference"], "tags": ["api", "endpoints", "rest"], "useCases": ["Integrating with Groq API"], "questions": ["How do I authenticate?", "What endpoints are available?"] } }, { "path": "docs", "skipped": true, "reason": "Content unchanged (hash matches)" } ], "totalPages": 121, "processed": 5, "skipped": 116, "withMetadata": 5, "withoutMetadata": 0, "cached": true, "force": false }
Response (Force Mode):
{ "message": "Recalculated 121 pages with AI metadata (force mode)", "results": [...], "totalPages": 121, "processed": 121, "skipped": 0, "force": true }
What it does:
- Fetches all pages (or skips unchanged ones in default mode)
 - Calculates token counts
 - Generates AI metadata (categories, tags, use cases, questions)
 - Generates embeddings (currently fake, ready for Groq API)
 - Calculates content hashes for change detection
 - Stores everything in cache
 
Important: This can take several minutes depending on:
- Number of pages to process (skipped pages are fast)
 - Network speed
 - Token calculation time
 - AI metadata generation time (uses Groq API)
 
- 
First Request:
- Check cache → Not found
 - Fetch from URL
 - Calculate tokens
 - Store in cache
 - Return data
 
 - 
Subsequent Requests:
- Check cache → Found
 - Return cached data immediately
 
 
Cache is stored in SQLite with the following schema:
CREATE TABLE groq_docs_cache_v3 (
  url TEXT PRIMARY KEY,
  content TEXT NOT NULL,
  charCount INTEGER NOT NULL,
  tokenCount INTEGER,
  frontmatter TEXT NOT NULL,
  metadata TEXT,
  contentHash TEXT,
  embeddings TEXT,
  cachedAt INTEGER NOT NULL
)
Fields:
url- Source URL (primary key)content- Full page content with frontmattercharCount- Character counttokenCount- Token count (calculated with tiktoken)frontmatter- Parsed frontmatter (JSON)metadata- AI-generated metadata (categories, tags, use cases, questions)contentHash- SHA-256 hash of content (for change detection)embeddings- Content embeddings vector (JSON array)cachedAt- Timestamp when cached
Cache is invalidated when:
- You manually clear it via 
/cache/clear - You recalculate via 
/cache/recalculate - Cache is cleared for a specific page via 
/cache/clear/:path 
Note: Cache does NOT automatically expire. If documentation changes, you must manually recalculate.
- 
Add URL to the
urlsarray inmain.tsx:const urls = [ // ... existing URLs "https://console.groq.com/docs/new-page.md", ]; - 
Run recalculate:
POST /cache/recalculate - 
Verify:
GET /cache/stats GET /list # Should include your new page 
Token counts are calculated using tiktoken with the gpt-4 encoding (cl100k_base). This is the same encoding used by:
- GPT-4
 - GPT-3.5-turbo
 - Many other OpenAI models
 
Token counts are:
- Calculated on first fetch
 - Stored in cache
 - Returned in API responses
 - Expensive to compute (which is why caching is important)
 
Each page can have AI-generated metadata using Groq's chat completions API:
- Categories: 2-4 broad categories (e.g., "API", "Authentication", "Models")
 - Tags: 5-10 specific tags/keywords
 - Use Cases: 2-4 practical use cases or scenarios
 - Questions: 5-10 questions users might ask
 
Metadata is generated during /cache/recalculate and stored in the cache.
The API includes a search endpoint (/search) that allows you to search across all documentation pages.
Currently uses keyword matching:
- Searches in page titles and content
 - Boosts results matching metadata (tags, categories, use cases)
 - Returns ranked results with relevance scores
 - Includes content snippets around matches
 
The search system is designed to support embeddings-based semantic search:
generateEmbeddings()- Generates embeddings (currently fake, ready for real API)vectorSearch()- Vector similarity search function (ready to use when embeddings are real)- Will enable semantic understanding of queries (not just keyword matching)
 
Content embeddings are generated for each page. Currently using a fake implementation (deterministic 384-dimensional vectors) that's ready to be replaced with actual embeddings API when available.
Embeddings are:
- Generated during recalculation
 - Stored in cache
 - Will be used for semantic search and similarity matching (currently using keyword search)
 
Content hashes (SHA-256) are calculated and stored for each page. This enables:
- Smart recalculation: Skip unchanged pages automatically
 - Efficient updates: Only process pages that have actually changed
 - Performance: Significantly faster recalculation when most content is unchanged
 
Hashes are compared during /cache/recalculate (default mode) to determine if a page needs reprocessing.
Run /cache/recalculate to refresh everything.
- Check 
/listto see if the path exists - Verify the URL is in the 
urlsarray - Ensure the path matches the URL structure (e.g., 
api-referencefor/docs/api-reference.md) 
- Clear cache for that page: 
POST /cache/clear/:path - Request the page again: 
GET /page/:path - Or recalculate everything: 
POST /cache/recalculate 
- Use 
/page/:pathendpoints (cached) instead of/data(uncached) - Check cache stats: 
GET /cache/stats - Ensure cache is populated before production use
 
The codebase is organized into modular files:
main.tsx- Main Hono app, routes, and URL definitionsutils.ts- Utility functions:- Cache management (getFromCache, setCache, clearCache, getCacheStats)
 - Content fetching (getTextFromUrl)
 - Frontmatter parsing (parseFrontmatter, addUrlSourceToFrontmatter)
 - Token counting (calculateTokenCount)
 - Hash calculation (calculateContentHash)
 - Rate limiting for fetches
 
groq.ts- Groq API functions:- Chat completions (groqChatCompletion)
 - Metadata generation (generatePageMetadata)
 
search.ts- Search and embeddings utilities:- Embeddings generation (generateEmbeddings) - fake implementation ready for real API
 - Search functions (searchPages) - keyword-based search (will use embeddings later)
 - Vector similarity search (vectorSearch) - ready for embeddings-based search
 
deno run --allow-net --allow-env main.tsx
Note: SQLite caching is automatically disabled when running locally (detected via valtown environment variable). The app will work without caching, but cache-related endpoints will return appropriate messages.
The app is configured to work with Val Town. Export uses:
export default (typeof Deno !== "undefined" && Deno.env.get("valtown")) ? app.fetch : app;
SQLite caching is automatically enabled when running in Val Town (detected via valtown environment variable).
GROQ_API_KEY- Required for AI metadata generation (optional, will disable metadata if not set)valtown- Automatically set by Val Town (detects environment)
- Use default recalculate mode - Automatically skips unchanged pages
 - Cache is your friend - Always populate cache before production use
 - Rate limiting - Built-in rate limiting prevents WAF blocking (1 request per 3 seconds for docs, 2 requests per second for Groq API)
 - Hash checking - Default recalculation mode is much faster when most content is unchanged
 
