A Hono API server that fetches, caches, and processes Groq documentation pages with token counting and AI-generated metadata.
- Fetches documentation pages from Groq's console
- Caches page content, metadata, token counts, and embeddings in SQLite
- Token counting using tiktoken (GPT-4 encoding)
- AI-generated metadata (categories, tags, use cases, sample questions)
- Content embeddings generation (currently fake, ready for Groq API integration)
- Hash-based change detection to skip unchanged pages during recalculation
- Rate limiting with async-sema to avoid WAF blocking
- RESTful API endpoints for accessing pages and managing cache
- Modular code structure (utils.ts for utilities, groq.ts for Groq API functions)
On first run, the cache will be empty. You should populate it by running:
GET /cache/recalculate
This will:
- Fetch all pages from the URLs list
- Calculate token counts for each page
- Generate AI metadata (categories, tags, use cases, questions)
- Generate embeddings for each page
- Calculate content hashes for change detection
- Store everything in the SQLite cache
- Return a summary of what was cached
Important: This will take some time as it processes all pages, generates metadata, and calculates tokens for each. Be patient!
Note: On subsequent runs, unchanged pages (detected by content hash) will be automatically skipped unless you use force mode.
Check that the cache was populated:
GET /cache/stats
This returns:
{ "cachedPages": 121, "totalTokens": 1234567 }
You should run /cache/recalculate in these scenarios:
- First time setup - Cache is empty
- URL list changes - You've added or removed URLs from the
urlsarray - Content updates - Documentation pages have been updated and you want fresh data
- Token count needed - You need accurate token counts for new content
- Metadata refresh - You want to regenerate AI metadata or embeddings
By default, /cache/recalculate uses hash-based change detection:
GET /cache/recalculate
Behavior:
- Fetches each page and calculates its content hash (SHA-256)
- Compares hash with cached version
- Skips pages with unchanged content (saves time and API calls)
- Only processes pages that have changed
- Still generates embeddings and metadata for changed pages
Response includes:
processed- Number of pages actually processedskipped- Number of pages skipped (unchanged)force- Alwaysfalsein default mode
To force recalculation of all pages (ignoring hash checks):
GET /cache/recalculate?force=true
Use cases:
- Regenerating all metadata/embeddings even if content unchanged
- After updating metadata generation prompts
- When you want to ensure everything is fresh
For single page updates, you can use:
GET /cache/clear/:path
This clears the cache for a specific page. The next time that page is requested via /page/:path, it will be fetched fresh and recached.
- Weekly: Run recalculate (default mode) to catch any documentation updates efficiently
- After major docs changes: Use force mode to regenerate everything
- When adding new pages: Update the
urlsarray, then run recalculate
Get the root docs page (cached if available).
Get a specific page by path. Examples:
/page/api-reference/page/agentic-tooling/compound-beta/page/model/llama-3.1-8b-instant
Response includes:
url- The source URLcontent- Full page content with frontmattercharCount- Character counttokenCount- Token count (calculated with tiktoken)- All frontmatter fields flattened (title, description, image, etc.)
Caching: Responses are cached. First request fetches and caches, subsequent requests are instant.
Get a list of all available page paths.
Response:
[ "docs", "agentic-tooling", "api-reference", ... ]
Search pages by query string.
Query Parameters:
q(required) - Search query stringlimit(optional) - Maximum number of results (default: 10)minScore(optional) - Minimum score threshold (default: 0)
Example:
GET /search?q=authentication&limit=5
Response:
{ "query": "authentication", "results": [ { "path": "api-reference", "url": "https://console.groq.com/docs/api-reference.md", "title": "API Reference", "score": 45, "snippet": "...authentication tokens are required for all API requests..." }, { "path": "quickstart", "url": "https://console.groq.com/docs/quickstart.md", "title": "Quick Start", "score": 32, "snippet": "...get your API key for authentication..." } ], "totalResults": 2, "totalPages": 121 }
Search Features:
- Keyword matching in titles and content
- Metadata boost (tags, categories, use cases)
- Score-based ranking
- Content snippets around matches
- Uses cached pages when available for faster results
Note: Currently uses keyword-based search. Future versions will use embeddings for semantic search.
Get metadata for all pages (does not use cache - fetches fresh).
Response:
{ "pages": [ { "url": "...", "charCount": 1234, "frontmatter": {...} } ], "contents": [...], "totalPages": 121, "totalChars": 1234567 }
Get cache statistics.
Response:
{ "cachedPages": 121, "totalTokens": 1234567 }
Clear the entire cache.
Response:
{ "message": "Cache cleared", "success": true }
Clear cache for a specific page.
Example:
GET /cache/clear/api-reference
Response:
{ "message": "Cache cleared for api-reference", "success": true }
Recalculate pages with AI metadata and embeddings generation.
Query Parameters:
force(optional): Set totrueto force recalculation of all pages, ignoring hash checks
Default Mode (no query params):
GET /cache/recalculate
Force Mode:
GET /cache/recalculate?force=true
Response (Default Mode):
{ "message": "Recalculated 5 pages, skipped 116 unchanged pages", "results": [ { "path": "api-reference", "url": "https://console.groq.com/docs/api-reference.md", "charCount": 1234, "tokenCount": 567, "title": "API Reference", "metadata": { "categories": ["API", "Reference"], "tags": ["api", "endpoints", "rest"], "useCases": ["Integrating with Groq API"], "questions": ["How do I authenticate?", "What endpoints are available?"] } }, { "path": "docs", "skipped": true, "reason": "Content unchanged (hash matches)" } ], "totalPages": 121, "processed": 5, "skipped": 116, "withMetadata": 5, "withoutMetadata": 0, "cached": true, "force": false }
Response (Force Mode):
{ "message": "Recalculated 121 pages with AI metadata (force mode)", "results": [...], "totalPages": 121, "processed": 121, "skipped": 0, "force": true }
What it does:
- Fetches all pages (or skips unchanged ones in default mode)
- Calculates token counts
- Generates AI metadata (categories, tags, use cases, questions)
- Generates embeddings (currently fake, ready for Groq API)
- Calculates content hashes for change detection
- Stores everything in cache
Important: This can take several minutes depending on:
- Number of pages to process (skipped pages are fast)
- Network speed
- Token calculation time
- AI metadata generation time (uses Groq API)
-
First Request:
- Check cache → Not found
- Fetch from URL
- Calculate tokens
- Store in cache
- Return data
-
Subsequent Requests:
- Check cache → Found
- Return cached data immediately
Cache is stored in SQLite with the following schema:
CREATE TABLE groq_docs_cache_v3 (
url TEXT PRIMARY KEY,
content TEXT NOT NULL,
charCount INTEGER NOT NULL,
tokenCount INTEGER,
frontmatter TEXT NOT NULL,
metadata TEXT,
contentHash TEXT,
embeddings TEXT,
cachedAt INTEGER NOT NULL
)
Fields:
url- Source URL (primary key)content- Full page content with frontmattercharCount- Character counttokenCount- Token count (calculated with tiktoken)frontmatter- Parsed frontmatter (JSON)metadata- AI-generated metadata (categories, tags, use cases, questions)contentHash- SHA-256 hash of content (for change detection)embeddings- Content embeddings vector (JSON array)cachedAt- Timestamp when cached
Cache is invalidated when:
- You manually clear it via
/cache/clear - You recalculate via
/cache/recalculate - Cache is cleared for a specific page via
/cache/clear/:path
Note: Cache does NOT automatically expire. If documentation changes, you must manually recalculate.
-
Add URL to the
urlsarray inmain.tsx:const urls = [ // ... existing URLs "https://console.groq.com/docs/new-page.md", ]; -
Run recalculate:
POST /cache/recalculate -
Verify:
GET /cache/stats GET /list # Should include your new page
Token counts are calculated using tiktoken with the gpt-4 encoding (cl100k_base). This is the same encoding used by:
- GPT-4
- GPT-3.5-turbo
- Many other OpenAI models
Token counts are:
- Calculated on first fetch
- Stored in cache
- Returned in API responses
- Expensive to compute (which is why caching is important)
Each page can have AI-generated metadata using Groq's chat completions API:
- Categories: 2-4 broad categories (e.g., "API", "Authentication", "Models")
- Tags: 5-10 specific tags/keywords
- Use Cases: 2-4 practical use cases or scenarios
- Questions: 5-10 questions users might ask
Metadata is generated during /cache/recalculate and stored in the cache.
The API includes a search endpoint (/search) that allows you to search across all documentation pages.
Currently uses keyword matching:
- Searches in page titles and content
- Boosts results matching metadata (tags, categories, use cases)
- Returns ranked results with relevance scores
- Includes content snippets around matches
The search system is designed to support embeddings-based semantic search:
generateEmbeddings()- Generates embeddings (currently fake, ready for real API)vectorSearch()- Vector similarity search function (ready to use when embeddings are real)- Will enable semantic understanding of queries (not just keyword matching)
Content embeddings are generated for each page. Currently using a fake implementation (deterministic 384-dimensional vectors) that's ready to be replaced with actual embeddings API when available.
Embeddings are:
- Generated during recalculation
- Stored in cache
- Will be used for semantic search and similarity matching (currently using keyword search)
Content hashes (SHA-256) are calculated and stored for each page. This enables:
- Smart recalculation: Skip unchanged pages automatically
- Efficient updates: Only process pages that have actually changed
- Performance: Significantly faster recalculation when most content is unchanged
Hashes are compared during /cache/recalculate (default mode) to determine if a page needs reprocessing.
Run /cache/recalculate to refresh everything.
- Check
/listto see if the path exists - Verify the URL is in the
urlsarray - Ensure the path matches the URL structure (e.g.,
api-referencefor/docs/api-reference.md)
- Clear cache for that page:
POST /cache/clear/:path - Request the page again:
GET /page/:path - Or recalculate everything:
POST /cache/recalculate
- Use
/page/:pathendpoints (cached) instead of/data(uncached) - Check cache stats:
GET /cache/stats - Ensure cache is populated before production use
The codebase is organized into modular files:
main.tsx- Main Hono app, routes, and URL definitionsutils.ts- Utility functions:- Cache management (getFromCache, setCache, clearCache, getCacheStats)
- Content fetching (getTextFromUrl)
- Frontmatter parsing (parseFrontmatter, addUrlSourceToFrontmatter)
- Token counting (calculateTokenCount)
- Hash calculation (calculateContentHash)
- Rate limiting for fetches
groq.ts- Groq API functions:- Chat completions (groqChatCompletion)
- Metadata generation (generatePageMetadata)
search.ts- Search and embeddings utilities:- Embeddings generation (generateEmbeddings) - fake implementation ready for real API
- Search functions (searchPages) - keyword-based search (will use embeddings later)
- Vector similarity search (vectorSearch) - ready for embeddings-based search
deno run --allow-net --allow-env main.tsx
Note: SQLite caching is automatically disabled when running locally (detected via valtown environment variable). The app will work without caching, but cache-related endpoints will return appropriate messages.
The app is configured to work with Val Town. Export uses:
export default (typeof Deno !== "undefined" && Deno.env.get("valtown")) ? app.fetch : app;
SQLite caching is automatically enabled when running in Val Town (detected via valtown environment variable).
GROQ_API_KEY- Required for AI metadata generation (optional, will disable metadata if not set)valtown- Automatically set by Val Town (detects environment)
- Use default recalculate mode - Automatically skips unchanged pages
- Cache is your friend - Always populate cache before production use
- Rate limiting - Built-in rate limiting prevents WAF blocking (1 request per 3 seconds for docs, 2 requests per second for Groq API)
- Hash checking - Default recalculation mode is much faster when most content is unchanged
