• Townie
    AI
  • Blog
  • Docs
  • Pricing
  • We’re hiring!
Log inSign up
yawnxyz

yawnxyz

groq-docs

Public
Like
groq-docs
Home
Code
14
answer
9
data
search
17
testing
7
utils
1
.vtignore
AGENTS.md
README.md
deno.json
groq.ts
H
main.tsx
todo.md
urls.ts
utils.ts
Branches
1
Pull requests
Remixes
History
Environment variables
Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data – all from the browser, and deployed in milliseconds.
Sign up now
Code
/
README.md
Code
/
README.md
Search
…
Viewing readonly version of main branch: v104
View latest version
README.md

Groq Docs API

A Hono API server that fetches, caches, and processes Groq documentation pages with token counting and AI-generated metadata.

Features

  • Fetches documentation pages from Groq's console
  • Caches page content, metadata, token counts, and embeddings in SQLite
  • Token counting using tiktoken (GPT-4 encoding)
  • AI-generated metadata (categories, tags, use cases, sample questions)
  • Content embeddings generation with multiple strategies (local ONNX, Transformers.js, API-based)
  • Semantic search with configurable strategies (embeddings + cosine similarity)
  • RAG-based question answering with configurable answer strategies (search + LLM)
  • Hash-based change detection to skip unchanged pages during recalculation
  • Rate limiting with async-sema to avoid WAF blocking
  • RESTful API endpoints for accessing pages, search, and Q&A
  • Modular code structure with pluggable strategies

First-Time Setup

1. Initial Cache Population

On first run, the cache will be empty. You should populate it by running:

GET /cache/recalculate

This will:

  • Fetch all pages from the URLs list
  • Calculate token counts for each page
  • Generate AI metadata (categories, tags, use cases, questions)
  • Generate embeddings for each page
  • Calculate content hashes for change detection
  • Store everything in the SQLite cache
  • Return a summary of what was cached

Important: This will take some time as it processes all pages, generates metadata, and calculates tokens for each. Be patient!

Note: On subsequent runs, unchanged pages (detected by content hash) will be automatically skipped unless you use force mode.

2. Verify Cache

Check that the cache was populated:

GET /cache/stats

This returns:

{ "cachedPages": 121, "totalTokens": 1234567 }

When to Recalculate

You should run /cache/recalculate in these scenarios:

✅ Required Recalculations

  1. First time setup - Cache is empty
  2. URL list changes - You've added or removed URLs from the urls array
  3. Content updates - Documentation pages have been updated and you want fresh data
  4. Token count needed - You need accurate token counts for new content
  5. Metadata refresh - You want to regenerate AI metadata or embeddings

🔄 Default Mode (Smart Recalculation)

By default, /cache/recalculate uses hash-based change detection:

GET /cache/recalculate

Behavior:

  • Fetches each page and calculates its content hash (SHA-256)
  • Compares hash with cached version
  • Skips pages with unchanged content (saves time and API calls)
  • Only processes pages that have changed
  • Still generates embeddings and metadata for changed pages

Response includes:

  • processed - Number of pages actually processed
  • skipped - Number of pages skipped (unchanged)
  • force - Always false in default mode

⚡ Force Mode (Recalculate Everything)

To force recalculation of all pages (ignoring hash checks):

GET /cache/recalculate?force=true

Use cases:

  • Regenerating all metadata/embeddings even if content unchanged
  • After updating metadata generation prompts
  • When you want to ensure everything is fresh

⚠️ Partial Updates

For single page updates, you can use:

GET /cache/clear/:path

This clears the cache for a specific page. The next time that page is requested via /page/:path, it will be fetched fresh and recached.

🔄 Routine Maintenance

  • Weekly: Run recalculate (default mode) to catch any documentation updates efficiently
  • After major docs changes: Use force mode to regenerate everything
  • When adding new pages: Update the urls array, then run recalculate

API Endpoints

Page Endpoints

GET /page/docs

Get the root docs page (cached if available).

GET /page/:path

Get a specific page by path. Examples:

  • /page/api-reference
  • /page/agentic-tooling/compound-beta
  • /page/model/llama-3.1-8b-instant

Response includes:

  • url - The source URL
  • content - Full page content with frontmatter
  • charCount - Character count
  • tokenCount - Token count (calculated with tiktoken)
  • All frontmatter fields flattened (title, description, image, etc.)

Caching: Responses are cached. First request fetches and caches, subsequent requests are instant.

GET /list

Get a list of all available page paths.

Response:

[ "docs", "agentic-tooling", "api-reference", ... ]

GET /search

Search pages by query string.

Query Parameters:

  • q (required) - Search query string
  • limit (optional) - Maximum number of results (default: 10)
  • minScore (optional) - Minimum score threshold (default: 0)

Example:

GET /search?q=authentication&limit=5

Response:

{ "query": "authentication", "results": [ { "path": "api-reference", "url": "https://console.groq.com/docs/api-reference.md", "title": "API Reference", "score": 45, "snippet": "...authentication tokens are required for all API requests..." }, { "path": "quickstart", "url": "https://console.groq.com/docs/quickstart.md", "title": "Quick Start", "score": 32, "snippet": "...get your API key for authentication..." } ], "totalResults": 2, "totalPages": 121 }

Search Features:

  • Keyword matching in titles and content
  • Metadata boost (tags, categories, use cases)
  • Score-based ranking
  • Content snippets around matches
  • Uses cached pages when available for faster results

Note: Currently uses embeddings-based semantic search. Multiple strategies available (see Search section).

GET /answer

Answer questions using RAG (Retrieval-Augmented Generation).

Query Parameters:

  • q (required) - Question to answer
  • limit (optional) - Max search results to consider (default: 10)
  • minScore (optional) - Minimum search score threshold (default: 0)
  • maxContextPages (optional) - Max pages to include in LLM context (default: 5)
  • temperature (optional) - LLM temperature 0-1 (default: 0.3)
  • model (optional) - Override LLM model (default: llama-3.3-70b-versatile)

Example:

GET /answer?q=How+do+I+authenticate+with+the+API&maxContextPages=5

Response:

{ "answer": "To authenticate with the Groq API, you need to...", "query": "How do I authenticate with the API?", "searchResults": [ { "path": "api-reference", "url": "https://console.groq.com/docs/api-reference.md", "title": "API Reference", "score": 92.5 } ], "contextUsed": 5, "totalTokens": 8500, "metadata": { "strategy": "llama-3.3-70b-default", "model": "llama-3.3-70b-versatile", "temperature": 0.3, "searchResultsCount": 10, "timings": { "search": 45.2, "contextPrep": 5.1, "llm": 1250.3, "total": 1300.6 } } }

How it works:

  1. Uses active search strategy to find relevant documentation
  2. Retrieves full content from top N results
  3. Formats documentation as context for the LLM
  4. Calls Groq API (Llama 3.3 70B) to generate an answer
  5. Returns markdown-formatted answer with sources

See /answer/ folder for available strategies and documentation.

GET /answer/info

Get information about the active answer strategy.

Response:

{ "strategy": { "name": "llama-3.3-70b-default", "description": "RAG using active search strategy + Llama 3.3 70B with up to 5 doc pages in context" }, "defaultOptions": { "model": "llama-3.3-70b-versatile", "temperature": 0.3, "maxContextPages": 5 }, "availableParams": { "q": "Query string (required)", "limit": "Max search results to consider (default: 10)", "minScore": "Minimum search score threshold (default: 0)", "maxContextPages": "Max pages to include in LLM context (default: 5)", "temperature": "LLM temperature (default: 0.3)", "model": "Override LLM model (optional)" } }

GET /answer/test

Run test queries against the active answer strategy.

Response:

{ "strategy": { "name": "llama-3.3-70b-default", "description": "..." }, "totalQueries": 1, "tests": [ { "query": "What is Compound and how does it work?", "answer": "markdown formatted answer...", "searchResults": [ { "path": "agentic-tooling/compound-beta", "url": "https://...", "title": "Compound", "score": 95.2 } ], "contextUsed": 5, "totalTokens": 8500, "durationMs": 1250.5, "timings": { "search": 45.2, "contextPrep": 5.1, "llm": 1200.2, "total": 1250.5 } } ], "summary": { "totalDurationMs": 1250.5, "avgDurationMs": 1250.5, "avgSearchMs": 45.2, "avgContextPrepMs": 5.1, "avgLlmMs": 1200.2, "avgTotalMs": 1250.5, "totalContextUsed": 5, "totalTokens": 8500, "errors": 0 } }

GET /data

Get metadata for all pages (does not use cache - fetches fresh).

Response:

{ "pages": [ { "url": "...", "charCount": 1234, "frontmatter": {...} } ], "contents": [...], "totalPages": 121, "totalChars": 1234567 }

Cache Management Endpoints

GET /cache/stats

Get cache statistics.

Response:

{ "cachedPages": 121, "totalTokens": 1234567 }

GET /cache/clear

Clear the entire cache.

Response:

{ "message": "Cache cleared", "success": true }

GET /cache/clear/:path

Clear cache for a specific page.

Example:

GET /cache/clear/api-reference

Response:

{ "message": "Cache cleared for api-reference", "success": true }

GET /cache/recalculate

Recalculate pages with AI metadata and embeddings generation.

Query Parameters:

  • force (optional): Set to true to force recalculation of all pages, ignoring hash checks

Default Mode (no query params):

GET /cache/recalculate

Force Mode:

GET /cache/recalculate?force=true

Response (Default Mode):

{ "message": "Recalculated 5 pages, skipped 116 unchanged pages", "results": [ { "path": "api-reference", "url": "https://console.groq.com/docs/api-reference.md", "charCount": 1234, "tokenCount": 567, "title": "API Reference", "metadata": { "categories": ["API", "Reference"], "tags": ["api", "endpoints", "rest"], "useCases": ["Integrating with Groq API"], "questions": ["How do I authenticate?", "What endpoints are available?"] } }, { "path": "docs", "skipped": true, "reason": "Content unchanged (hash matches)" } ], "totalPages": 121, "processed": 5, "skipped": 116, "withMetadata": 5, "withoutMetadata": 0, "cached": true, "force": false }

Response (Force Mode):

{ "message": "Recalculated 121 pages with AI metadata (force mode)", "results": [...], "totalPages": 121, "processed": 121, "skipped": 0, "force": true }

What it does:

  • Fetches all pages (or skips unchanged ones in default mode)
  • Calculates token counts
  • Generates AI metadata (categories, tags, use cases, questions)
  • Generates embeddings (currently fake, ready for Groq API)
  • Calculates content hashes for change detection
  • Stores everything in cache

Important: This can take several minutes depending on:

  • Number of pages to process (skipped pages are fast)
  • Network speed
  • Token calculation time
  • AI metadata generation time (uses Groq API)

Cache Behavior

How Caching Works

  1. First Request:

    • Check cache → Not found
    • Fetch from URL
    • Calculate tokens
    • Store in cache
    • Return data
  2. Subsequent Requests:

    • Check cache → Found
    • Return cached data immediately

Cache Storage

Cache is stored in SQLite with the following schema:

CREATE TABLE groq_docs_cache_v3 ( url TEXT PRIMARY KEY, content TEXT NOT NULL, charCount INTEGER NOT NULL, tokenCount INTEGER, frontmatter TEXT NOT NULL, metadata TEXT, contentHash TEXT, embeddings TEXT, cachedAt INTEGER NOT NULL )

Fields:

  • url - Source URL (primary key)
  • content - Full page content with frontmatter
  • charCount - Character count
  • tokenCount - Token count (calculated with tiktoken)
  • frontmatter - Parsed frontmatter (JSON)
  • metadata - AI-generated metadata (categories, tags, use cases, questions)
  • contentHash - SHA-256 hash of content (for change detection)
  • embeddings - Content embeddings vector (JSON array)
  • cachedAt - Timestamp when cached

Cache Invalidation

Cache is invalidated when:

  • You manually clear it via /cache/clear
  • You recalculate via /cache/recalculate
  • Cache is cleared for a specific page via /cache/clear/:path

Note: Cache does NOT automatically expire. If documentation changes, you must manually recalculate.

Adding New Pages

  1. Add URL to the urls array in main.tsx:

    const urls = [ // ... existing URLs "https://console.groq.com/docs/new-page.md", ];
  2. Run recalculate:

    POST /cache/recalculate
  3. Verify:

    GET /cache/stats GET /list # Should include your new page

Token Counting

Token counts are calculated using tiktoken with the gpt-4 encoding (cl100k_base). This is the same encoding used by:

  • GPT-4
  • GPT-3.5-turbo
  • Many other OpenAI models

Token counts are:

  • Calculated on first fetch
  • Stored in cache
  • Returned in API responses
  • Expensive to compute (which is why caching is important)

AI Metadata Generation

Each page can have AI-generated metadata using Groq's chat completions API:

  • Categories: 2-4 broad categories (e.g., "API", "Authentication", "Models")
  • Tags: 5-10 specific tags/keywords
  • Use Cases: 2-4 practical use cases or scenarios
  • Questions: 5-10 questions users might ask

Metadata is generated during /cache/recalculate and stored in the cache.

Search

The API includes a search endpoint (/search) that allows you to search across all documentation pages using various semantic search strategies.

Available Search Strategies

The search system supports multiple strategies that can be switched by commenting/uncommenting imports in search/index.ts. Each strategy has different trade-offs in terms of speed, accuracy, and infrastructure requirements.

🏆 Recommended: Local ONNX Models (Fastest)

File: search/transformers-local-onnx.ts

Pre-downloaded ONNX models for the fastest embedding generation with zero network overhead.

Performance: ~10-30ms per query (after initial ~50ms model load)

Advantages:

  • ✅ No network calls - works completely offline
  • ✅ No downloads on first run - instant startup
  • ✅ No isolate loading delays - perfect for serverless
  • ✅ Same accuracy as the cached version
  • ✅ Perfect for production - predictable performance

Setup:

  1. Download the model:
    cd search/models ./download-model.sh
  2. Activate in search/index.ts:
    import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";

Requirements: ~23MB disk space for model files

See search/models/SETUP.md for detailed setup instructions.

Alternative: Transformers.js with Auto-Download

File: search/transformers-cosine.ts

Uses Transformers.js with automatic model downloading from Hugging Face.

Performance:

  • First run: ~3-5s (downloads ~23MB model)
  • Cached: ~150ms model load + ~10-30ms per query

Advantages:

  • ✅ No API keys needed
  • ✅ Works in browser and Deno
  • ✅ Automatic caching

Disadvantages:

  • ❌ Slow first run (downloads model)
  • ❌ Isolate loading delays in serverless environments
  • ❌ May not work in some restricted environments

Other Strategies (API-Based)

All require API keys but offer different trade-offs:

StrategyFileSpeedCostPros
Mixedbreadmixedbread-embeddings-cosine.ts~50-100msFree tierHigh quality, 1024 dims
OpenAIopenai-cosine.ts~100-200msPaidHigh quality, reliable
HuggingFacehf-inference-qwen3-cosine.ts~150-300msFree tierQwen3-8B model
Cloudflarecloudflare-bge-cosine.ts~50-150msFree tierWorks on CF Workers
JigsawStackjigsawstack-orama.ts~550msFree tierManaged search

Switching Strategies

Edit search/index.ts and comment/uncomment the desired strategy:

// Comment out current strategy // import { searchStrategy, generateEmbeddings } from "./transformers-cosine.ts"; // Uncomment desired strategy import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";

Current Implementation (Semantic Search)

The search system uses semantic embeddings for intelligent search:

  • Understands meaning, not just keywords
  • Finds relevant results even with different wording
  • Returns ranked results with similarity scores
  • Includes content snippets with highlighted matches
  • Uses cosine similarity for fast comparison

Search Architecture

  1. Embedding Generation: Content is converted to 384-dimensional vectors
  2. Cosine Similarity: Query embeddings compared against page embeddings
  3. Ranking: Results sorted by similarity score
  4. Snippet Generation: Context-aware snippets around relevant content

Answer Strategies (RAG)

The API includes answer generation using Retrieval-Augmented Generation (RAG) - combining semantic search with LLM inference to answer questions about the documentation.

How RAG Works

  1. Search: Use active search strategy to find relevant documentation pages
  2. Retrieve: Get full content from top N search results
  3. Format: Package documentation as context for the LLM
  4. Generate: Call Groq LLM to generate an answer based on the context

Available Answer Strategies

Answer strategies are located in the /answer/ folder and can be switched by editing answer/index.ts.

🏆 Current Default: llama-3.3-70b-default

File: answer/llama-3.3-70b-default.ts

Uses Groq's Llama 3.3 70B model with up to 5 documentation pages in context.

Performance: ~1-3s total (depends on search + LLM response time)

Configuration:

  • Model: llama-3.3-70b-versatile
  • Max context pages: 5
  • Tokens per page: ~2000
  • Temperature: 0.3 (focused/deterministic)

Advantages:

  • ✅ High-quality answers with good reasoning
  • ✅ Handles complex questions well
  • ✅ Includes relevant citations
  • ✅ Markdown-formatted responses

Usage:

# Basic question GET /answer?q=How+do+I+use+streaming # With options GET /answer?q=What+models+are+available&maxContextPages=3&temperature=0.5 # Different model GET /answer?q=Quick+question&model=llama-3.1-8b-instant

Creating New Answer Strategies

You can create custom strategies for different use cases:

Ideas for new strategies:

  • llama-3.1-8b-fast: Faster responses with smaller model (good for simple questions)
  • mixtral-8x7b-extended: More context pages with Mixtral's larger context window
  • llama-3.3-70b-code: Specialized prompts for code examples and API usage
  • citation-mode: Include specific citations/references in answers
  • multi-step: Break down complex questions into sub-questions

See answer/README.md and answer/QUICK-START.md for detailed documentation and guides.

Answer Strategy Architecture

Each strategy implements:

  1. Search integration: Uses active search strategy from /search/
  2. Context management: Formats pages to fit context window
  3. LLM configuration: Model selection, temperature, prompts
  4. Response formatting: Structures answer with metadata

Tuning Answer Quality

For better answers:

  • Increase maxContextPages (more documentation context)
  • Raise minScore (only use highly relevant pages)
  • Use larger models (70B+ for complex reasoning)
  • Lower temperature (0.2-0.3 for factual accuracy)

For faster responses:

  • Decrease maxContextPages (less context to process)
  • Use smaller models (8B for simple questions)
  • Use faster search strategies

For creative responses:

  • Increase temperature (0.6-0.8)
  • Use models good at creative writing
  • Adjust system prompts

Embeddings

Content embeddings are generated for each page using the active search strategy (see Search section above).

Current Default: Local ONNX models (transformers-local-onnx.ts)

  • Model: all-MiniLM-L6-v2
  • Dimensions: 384
  • Generation: ~10-30ms per page
  • Storage: Cached as JSON arrays in SQLite

Embeddings are:

  • Generated during /cache/recalculate
  • Stored in cache for fast retrieval
  • Used for semantic search and similarity matching
  • Portable across different strategies (same dimensions)

Hash-Based Change Detection

Content hashes (SHA-256) are calculated and stored for each page. This enables:

  • Smart recalculation: Skip unchanged pages automatically
  • Efficient updates: Only process pages that have actually changed
  • Performance: Significantly faster recalculation when most content is unchanged

Hashes are compared during /cache/recalculate (default mode) to determine if a page needs reprocessing.

Troubleshooting

Cache seems stale

Run /cache/recalculate to refresh everything.

Page not found

  1. Check /list to see if the path exists
  2. Verify the URL is in the urls array
  3. Ensure the path matches the URL structure (e.g., api-reference for /docs/api-reference.md)

Token counts seem wrong

  1. Clear cache for that page: POST /cache/clear/:path
  2. Request the page again: GET /page/:path
  3. Or recalculate everything: POST /cache/recalculate

Performance issues

  • Use /page/:path endpoints (cached) instead of /data (uncached)
  • Check cache stats: GET /cache/stats
  • Ensure cache is populated before production use

Code Structure

The codebase is organized into modular files:

  • main.tsx - Main Hono app, routes, and URL definitions
  • utils.ts - Utility functions:
    • Cache management (getFromCache, setCache, clearCache, getCacheStats)
    • Content fetching (getTextFromUrl)
    • Frontmatter parsing (parseFrontmatter, addUrlSourceToFrontmatter)
    • Token counting (calculateTokenCount)
    • Hash calculation (calculateContentHash)
    • Rate limiting for fetches
  • groq.ts - Groq API functions:
    • Chat completions (groqChatCompletion)
    • Metadata generation (generatePageMetadata)
  • search/ - Search strategies with pluggable implementations:
    • index.ts - Main entry point, switches between strategies
    • types.ts - Type definitions for search
    • utils.ts - Shared utilities (cosine similarity, snippets)
    • Multiple strategy files (transformers-local-onnx, mixedbread, openai, etc.)
  • answer/ - Answer strategies with pluggable RAG implementations:
    • index.ts - Main entry point, switches between strategies
    • types.ts - Type definitions for answers
    • utils.ts - Shared utilities (context formatting, token estimation)
    • Multiple strategy files (llama-3.3-70b-default, etc.)

Development

Local Development

# Start the server deno task serve # Or manually deno run --allow-net --allow-env main.tsx

Note: SQLite caching is automatically disabled when running locally (detected via valtown environment variable). The app will work without caching, but cache-related endpoints will return appropriate messages.

Deno Tasks

The project includes several convenience tasks defined in deno.json:

Server Tasks

# Start the development server deno task serve

Recalculation Tasks

# Recalculate with active search strategy (smart mode, skips unchanged pages) deno task recalc # Force recalculation (recalculates all pages) deno task recalc-f # Recalculate with Mixedbread embeddings strategy deno task recalc-mxbai # Force recalculation with Mixedbread embeddings deno task recalc-mxbai-f

Testing Tasks

# Test search strategy with detailed timing breakdown deno task search # Test answer strategy with detailed timing breakdown and search results deno task answer

Test Output Features:

  • ⏱️ Comprehensive timing breakdown (search, context prep, LLM call, total)
  • 📊 Shows search results used for context
  • 💬 Displays generated answers (truncated for readability)
  • 📈 Summary statistics (averages, totals)
  • 🎯 Strategy information

Example test output:

⏱️  Timing breakdown:
   Search: 45.2ms
   Context prep: 5.1ms
   LLM call: 1200.2ms
   Total: 1250.5ms

📚 Search results used (top 5):
✓ 1. Compound
     Path: agentic-tooling/compound-beta
     Score: 95.20

💬 Generated Answer:
──────────────────────────────────────────────────────────────────────────────
Compound is a beta feature...
(answer continues)
──────────────────────────────────────────────────────────────────────────────

Val Town

The app is configured to work with Val Town. Export uses:

export default (typeof Deno !== "undefined" && Deno.env.get("valtown")) ? app.fetch : app;

SQLite caching is automatically enabled when running in Val Town (detected via valtown environment variable).

Environment Variables

  • GROQ_API_KEY - Required for AI metadata generation (optional, will disable metadata if not set)
  • valtown - Automatically set by Val Town (detects environment)

Performance Tips

  1. Use default recalculate mode - Automatically skips unchanged pages
  2. Cache is your friend - Always populate cache before production use
  3. Rate limiting - Built-in rate limiting prevents WAF blocking (1 request per 3 seconds for docs, 2 requests per second for Groq API)
  4. Hash checking - Default recalculation mode is much faster when most content is unchanged
FeaturesVersion controlCode intelligenceCLI
Use cases
TeamsAI agentsSlackGTM
DocsShowcaseTemplatesNewestTrendingAPI examplesNPM packages
PricingNewsletterBlogAboutCareers
We’re hiring!
Brandhi@val.townStatus
X (Twitter)
Discord community
GitHub discussions
YouTube channel
Bluesky
Open Source Pledge
Terms of usePrivacy policyAbuse contact
© 2025 Val Town, Inc.