A Hono API server that fetches, caches, and processes Groq documentation pages with token counting and AI-generated metadata.
On first run, the cache will be empty. You should populate it by running:
GET /cache/recalculate
This will:
Important: This will take some time as it processes all pages, generates metadata, and calculates tokens for each. Be patient!
Note: On subsequent runs, unchanged pages (detected by content hash) will be automatically skipped unless you use force mode.
Check that the cache was populated:
GET /cache/stats
This returns:
{ "cachedPages": 121, "totalTokens": 1234567 }
You should run /cache/recalculate in these scenarios:
urls arrayBy default, /cache/recalculate uses hash-based change detection:
GET /cache/recalculate
Behavior:
Response includes:
processed - Number of pages actually processedskipped - Number of pages skipped (unchanged)force - Always false in default modeTo force recalculation of all pages (ignoring hash checks):
GET /cache/recalculate?force=true
Use cases:
For single page updates, you can use:
GET /cache/clear/:path
This clears the cache for a specific page. The next time that page is requested via /page/:path, it will be fetched fresh and recached.
urls array, then run recalculateGet the root docs page (cached if available).
Get a specific page by path. Examples:
/page/api-reference/page/agentic-tooling/compound-beta/page/model/llama-3.1-8b-instantResponse includes:
url - The source URLcontent - Full page content with frontmattercharCount - Character counttokenCount - Token count (calculated with tiktoken)Caching: Responses are cached. First request fetches and caches, subsequent requests are instant.
Get a list of all available page paths.
Response:
[ "docs", "agentic-tooling", "api-reference", ... ]
Search pages by query string.
Query Parameters:
q (required) - Search query stringlimit (optional) - Maximum number of results (default: 10)minScore (optional) - Minimum score threshold (default: 0)Example:
GET /search?q=authentication&limit=5
Response:
{ "query": "authentication", "results": [ { "path": "api-reference", "url": "https://console.groq.com/docs/api-reference.md", "title": "API Reference", "score": 45, "snippet": "...authentication tokens are required for all API requests..." }, { "path": "quickstart", "url": "https://console.groq.com/docs/quickstart.md", "title": "Quick Start", "score": 32, "snippet": "...get your API key for authentication..." } ], "totalResults": 2, "totalPages": 121 }
Search Features:
Note: Currently uses embeddings-based semantic search. Multiple strategies available (see Search section).
Answer questions using RAG (Retrieval-Augmented Generation).
Query Parameters:
q (required) - Question to answerlimit (optional) - Max search results to consider (default: 10)minScore (optional) - Minimum search score threshold (default: 0)maxContextPages (optional) - Max pages to include in LLM context (default: 5)temperature (optional) - LLM temperature 0-1 (default: 0.3)model (optional) - Override LLM model (default: llama-3.3-70b-versatile)Example:
GET /answer?q=How+do+I+authenticate+with+the+API&maxContextPages=5
Response:
{ "answer": "To authenticate with the Groq API, you need to...", "query": "How do I authenticate with the API?", "searchResults": [ { "path": "api-reference", "url": "https://console.groq.com/docs/api-reference.md", "title": "API Reference", "score": 92.5 } ], "contextUsed": 5, "totalTokens": 8500, "metadata": { "strategy": "llama-3.3-70b-default", "model": "llama-3.3-70b-versatile", "temperature": 0.3, "searchResultsCount": 10, "timings": { "search": 45.2, "contextPrep": 5.1, "llm": 1250.3, "total": 1300.6 } } }
How it works:
See /answer/ folder for available strategies and documentation.
Get information about the active answer strategy.
Response:
{ "strategy": { "name": "llama-3.3-70b-default", "description": "RAG using active search strategy + Llama 3.3 70B with up to 5 doc pages in context" }, "defaultOptions": { "model": "llama-3.3-70b-versatile", "temperature": 0.3, "maxContextPages": 5 }, "availableParams": { "q": "Query string (required)", "limit": "Max search results to consider (default: 10)", "minScore": "Minimum search score threshold (default: 0)", "maxContextPages": "Max pages to include in LLM context (default: 5)", "temperature": "LLM temperature (default: 0.3)", "model": "Override LLM model (optional)" } }
Run test queries against the active answer strategy.
Response:
{ "strategy": { "name": "llama-3.3-70b-default", "description": "..." }, "totalQueries": 1, "tests": [ { "query": "What is Compound and how does it work?", "answer": "markdown formatted answer...", "searchResults": [ { "path": "agentic-tooling/compound-beta", "url": "https://...", "title": "Compound", "score": 95.2 } ], "contextUsed": 5, "totalTokens": 8500, "durationMs": 1250.5, "timings": { "search": 45.2, "contextPrep": 5.1, "llm": 1200.2, "total": 1250.5 } } ], "summary": { "totalDurationMs": 1250.5, "avgDurationMs": 1250.5, "avgSearchMs": 45.2, "avgContextPrepMs": 5.1, "avgLlmMs": 1200.2, "avgTotalMs": 1250.5, "totalContextUsed": 5, "totalTokens": 8500, "errors": 0 } }
Get metadata for all pages (does not use cache - fetches fresh).
Response:
{ "pages": [ { "url": "...", "charCount": 1234, "frontmatter": {...} } ], "contents": [...], "totalPages": 121, "totalChars": 1234567 }
Get cache statistics.
Response:
{ "cachedPages": 121, "totalTokens": 1234567 }
Clear the entire cache.
Response:
{ "message": "Cache cleared", "success": true }
Clear cache for a specific page.
Example:
GET /cache/clear/api-reference
Response:
{ "message": "Cache cleared for api-reference", "success": true }
Recalculate pages with AI metadata and embeddings generation.
Query Parameters:
force (optional): Set to true to force recalculation of all pages, ignoring hash checksDefault Mode (no query params):
GET /cache/recalculate
Force Mode:
GET /cache/recalculate?force=true
Response (Default Mode):
{ "message": "Recalculated 5 pages, skipped 116 unchanged pages", "results": [ { "path": "api-reference", "url": "https://console.groq.com/docs/api-reference.md", "charCount": 1234, "tokenCount": 567, "title": "API Reference", "metadata": { "categories": ["API", "Reference"], "tags": ["api", "endpoints", "rest"], "useCases": ["Integrating with Groq API"], "questions": ["How do I authenticate?", "What endpoints are available?"] } }, { "path": "docs", "skipped": true, "reason": "Content unchanged (hash matches)" } ], "totalPages": 121, "processed": 5, "skipped": 116, "withMetadata": 5, "withoutMetadata": 0, "cached": true, "force": false }
Response (Force Mode):
{ "message": "Recalculated 121 pages with AI metadata (force mode)", "results": [...], "totalPages": 121, "processed": 121, "skipped": 0, "force": true }
What it does:
Important: This can take several minutes depending on:
First Request:
Subsequent Requests:
Cache is stored in SQLite with the following schema:
CREATE TABLE groq_docs_cache_v3 (
url TEXT PRIMARY KEY,
content TEXT NOT NULL,
charCount INTEGER NOT NULL,
tokenCount INTEGER,
frontmatter TEXT NOT NULL,
metadata TEXT,
contentHash TEXT,
embeddings TEXT,
cachedAt INTEGER NOT NULL
)
Fields:
url - Source URL (primary key)content - Full page content with frontmattercharCount - Character counttokenCount - Token count (calculated with tiktoken)frontmatter - Parsed frontmatter (JSON)metadata - AI-generated metadata (categories, tags, use cases, questions)contentHash - SHA-256 hash of content (for change detection)embeddings - Content embeddings vector (JSON array)cachedAt - Timestamp when cachedCache is invalidated when:
/cache/clear/cache/recalculate/cache/clear/:pathNote: Cache does NOT automatically expire. If documentation changes, you must manually recalculate.
Add URL to the urls array in main.tsx:
const urls = [
// ... existing URLs
"https://console.groq.com/docs/new-page.md",
];
Run recalculate:
POST /cache/recalculate
Verify:
GET /cache/stats GET /list # Should include your new page
Token counts are calculated using tiktoken with the gpt-4 encoding (cl100k_base). This is the same encoding used by:
Token counts are:
Each page can have AI-generated metadata using Groq's chat completions API:
Metadata is generated during /cache/recalculate and stored in the cache.
The API includes a search endpoint (/search) that allows you to search across all documentation pages using various semantic search strategies.
The search system supports multiple strategies that can be switched by commenting/uncommenting imports in search/index.ts. Each strategy has different trade-offs in terms of speed, accuracy, and infrastructure requirements.
File: search/transformers-local-onnx.ts
Pre-downloaded ONNX models for the fastest embedding generation with zero network overhead.
Performance: ~10-30ms per query (after initial ~50ms model load)
Advantages:
Setup:
cd search/models ./download-model.sh
search/index.ts:
import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";
Requirements: ~23MB disk space for model files
See search/models/SETUP.md for detailed setup instructions.
File: search/transformers-cosine.ts
Uses Transformers.js with automatic model downloading from Hugging Face.
Performance:
Advantages:
Disadvantages:
All require API keys but offer different trade-offs:
| Strategy | File | Speed | Cost | Pros |
|---|---|---|---|---|
| Mixedbread | mixedbread-embeddings-cosine.ts | ~50-100ms | Free tier | High quality, 1024 dims |
| OpenAI | openai-cosine.ts | ~100-200ms | Paid | High quality, reliable |
| HuggingFace | hf-inference-qwen3-cosine.ts | ~150-300ms | Free tier | Qwen3-8B model |
| Cloudflare | cloudflare-bge-cosine.ts | ~50-150ms | Free tier | Works on CF Workers |
| JigsawStack | jigsawstack-orama.ts | ~550ms | Free tier | Managed search |
Edit search/index.ts and comment/uncomment the desired strategy:
// Comment out current strategy
// import { searchStrategy, generateEmbeddings } from "./transformers-cosine.ts";
// Uncomment desired strategy
import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";
The search system uses semantic embeddings for intelligent search:
The API includes answer generation using Retrieval-Augmented Generation (RAG) - combining semantic search with LLM inference to answer questions about the documentation.
Answer strategies are located in the /answer/ folder and can be switched by editing answer/index.ts.
File: answer/llama-3.3-70b-default.ts
Uses Groq's Llama 3.3 70B model with up to 5 documentation pages in context.
Performance: ~1-3s total (depends on search + LLM response time)
Configuration:
llama-3.3-70b-versatileAdvantages:
Usage:
# Basic question GET /answer?q=How+do+I+use+streaming # With options GET /answer?q=What+models+are+available&maxContextPages=3&temperature=0.5 # Different model GET /answer?q=Quick+question&model=llama-3.1-8b-instant
You can create custom strategies for different use cases:
Ideas for new strategies:
See answer/README.md and answer/QUICK-START.md for detailed documentation and guides.
Each strategy implements:
/search/For better answers:
maxContextPages (more documentation context)minScore (only use highly relevant pages)For faster responses:
maxContextPages (less context to process)For creative responses:
Content embeddings are generated for each page using the active search strategy (see Search section above).
Current Default: Local ONNX models (transformers-local-onnx.ts)
Embeddings are:
/cache/recalculateContent hashes (SHA-256) are calculated and stored for each page. This enables:
Hashes are compared during /cache/recalculate (default mode) to determine if a page needs reprocessing.
Run /cache/recalculate to refresh everything.
/list to see if the path existsurls arrayapi-reference for /docs/api-reference.md)POST /cache/clear/:pathGET /page/:pathPOST /cache/recalculate/page/:path endpoints (cached) instead of /data (uncached)GET /cache/statsThe codebase is organized into modular files:
main.tsx - Main Hono app, routes, and URL definitionsutils.ts - Utility functions:
groq.ts - Groq API functions:
search/ - Search strategies with pluggable implementations:
index.ts - Main entry point, switches between strategiestypes.ts - Type definitions for searchutils.ts - Shared utilities (cosine similarity, snippets)answer/ - Answer strategies with pluggable RAG implementations:
index.ts - Main entry point, switches between strategiestypes.ts - Type definitions for answersutils.ts - Shared utilities (context formatting, token estimation)# Start the server deno task serve # Or manually deno run --allow-net --allow-env main.tsx
Note: SQLite caching is automatically disabled when running locally (detected via valtown environment variable). The app will work without caching, but cache-related endpoints will return appropriate messages.
The project includes several convenience tasks defined in deno.json:
# Start the development server deno task serve
# Recalculate with active search strategy (smart mode, skips unchanged pages) deno task recalc # Force recalculation (recalculates all pages) deno task recalc-f # Recalculate with Mixedbread embeddings strategy deno task recalc-mxbai # Force recalculation with Mixedbread embeddings deno task recalc-mxbai-f
# Test search strategy with detailed timing breakdown deno task search # Test answer strategy with detailed timing breakdown and search results deno task answer
Test Output Features:
Example test output:
ā±ļø Timing breakdown:
Search: 45.2ms
Context prep: 5.1ms
LLM call: 1200.2ms
Total: 1250.5ms
š Search results used (top 5):
ā 1. Compound
Path: agentic-tooling/compound-beta
Score: 95.20
š¬ Generated Answer:
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Compound is a beta feature...
(answer continues)
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
The app is configured to work with Val Town. Export uses:
export default (typeof Deno !== "undefined" && Deno.env.get("valtown")) ? app.fetch : app;
SQLite caching is automatically enabled when running in Val Town (detected via valtown environment variable).
GROQ_API_KEY - Required for AI metadata generation (optional, will disable metadata if not set)valtown - Automatically set by Val Town (detects environment)