• Townie
    AI
  • Blog
  • Docs
  • Pricing
  • We’re hiring!
Log inSign up
yawnxyz

yawnxyz

groq-docs

Public
Like
groq-docs
Home
Code
14
answer
9
data
search
12
testing
4
utils
1
.vtignore
AGENTS.md
README.md
deno.json
groq.ts
H
main.tsx
todo.md
urls.ts
utils.ts
Branches
1
Pull requests
Remixes
History
Environment variables
Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data – all from the browser, and deployed in milliseconds.
Sign up now
Code
/
answer
/
README.md
Code
/
answer
/
README.md
Search
…
Viewing readonly version of main branch: v75
View latest version
README.md

Answer Strategies

Answer strategies implement Retrieval-Augmented Generation (RAG) to answer user questions using documentation search and LLM inference.

How It Works

Each answer strategy follows this general pattern:

  1. Search: Use the active search strategy to find relevant documentation pages
  2. Retrieve: Get full content from the top N search results
  3. Format: Format the documentation as context for the LLM
  4. Generate: Call the Groq LLM with the context and user question to generate an answer

Active Strategy

The active strategy is selected in answer/index.ts by uncommenting the desired import.

Current Active Strategy

llama-3.3-70b-default - Default strategy using Llama 3.3 70B

  • Uses up to 5 documentation pages in context
  • Truncates each page to ~2000 tokens to manage context window
  • Temperature: 0.3 (relatively focused/deterministic)
  • Model: llama-3.3-70b-versatile

Available Strategies

1. llama-3.3-70b-default.ts

Best for: General purpose Q&A with good balance of quality and speed

  • Model: llama-3.3-70b-versatile
  • Context: Up to 5 pages, 2000 tokens each
  • Temperature: 0.3 (default)
  • Performance: ~1-3s depending on search and LLM response time

Configuration options:

{ limit: 10, // Search results to consider minScore: 0, // Minimum search score maxContextPages: 5, // Pages to include in context temperature: 0.3, // LLM temperature model: "llama-3.3-70b-versatile" // Override model }

Creating New Strategies

To create a new answer strategy:

  1. Create a new file in /answer/ (e.g., llama-3.1-8b-fast.ts)
  2. Implement the AnswerStrategy interface:
import type { AnswerStrategy, AnswerResult, AnswerOptions } from "./types.ts"; export const answerStrategy: AnswerStrategy = { name: "your-strategy-name", description: "Brief description of your strategy", answer: async (query: string, options: AnswerOptions = {}): Promise<AnswerResult> => { // Your implementation here // 1. Search for relevant pages // 2. Get full content // 3. Format as context // 4. Call LLM // 5. Return AnswerResult }, };
  1. Add the import to answer/index.ts and comment out other strategies
  2. Test your strategy

Strategy Ideas

Here are some ideas for additional strategies you might want to implement:

  • llama-3.1-8b-fast: Faster responses with smaller model, fewer pages
  • mixtral-8x7b-extended: More context pages with Mixtral's larger context window
  • llama-3.3-70b-code-focused: Specialized prompts for code examples and API usage
  • multi-step-reasoning: Break down complex questions into sub-questions
  • citation-mode: Include specific citations/references in the answer
  • conversational: Support follow-up questions with conversation history

API Usage

Basic Answer Request

GET /answer?q=How+do+I+use+the+Groq+API

With Options

GET /answer?q=What+models+are+available&maxContextPages=3&temperature=0.5

Get Strategy Info

GET /answer/info

Response Format

{ "answer": "markdown formatted answer", "query": "user's original question", "searchResults": [ { "path": "api/reference", "url": "https://...", "title": "API Reference", "score": 95.2 } ], "contextUsed": 5, "totalTokens": 8500, "metadata": { "strategy": "llama-3.3-70b-default", "model": "llama-3.3-70b-versatile", "temperature": 0.3, "searchResultsCount": 10, "timings": { "search": 45.2, "contextPrep": 5.1, "llm": 1250.3, "total": 1300.6 } } }

Tuning Your Strategy

Context Size

  • More pages = more comprehensive but slower and more expensive
  • Fewer pages = faster but might miss relevant info
  • Recommended: 3-5 pages for most questions

Token Limits

  • Per page: 1000-3000 tokens is a good range
  • Total context: Stay under 32k tokens for most models
  • Llama 3.3 70B supports up to 128k context, but quality may degrade

Temperature

  • 0.0-0.3: Focused, deterministic answers (good for facts)
  • 0.4-0.7: Balanced creativity and accuracy
  • 0.8-1.0: More creative but less consistent (avoid for docs Q&A)

Search Configuration

  • Higher limit: More pages to choose from (10-20 recommended)
  • Higher minScore: Only use highly relevant pages (50-70 threshold)
  • Balance between precision (fewer, better pages) and recall (more pages, might miss relevant info)

Best Practices

  1. Match model to task:

    • Simple factual questions: Smaller/faster models
    • Complex reasoning: Larger models (70B+)
    • Code generation: Models fine-tuned for code
  2. System prompts matter:

    • Be specific about the documentation domain (Groq API)
    • Set expectations for answer format (markdown, citations, etc.)
    • Provide guidelines for handling missing information
    • Include URL formatting instructions (e.g., "remove .md extensions")
  3. Manage context window:

    • Don't just dump all pages - be selective
    • Truncate long pages intelligently
    • Consider summarizing very long pages first
  4. Error handling:

    • Gracefully handle search failures
    • Provide fallback responses when LLM fails
    • Include relevant metadata for debugging
  5. Post-processing:

    • Use cleanupMarkdownLinks() to remove .md extensions from generated links
    • Add custom post-processing for your specific needs
    • Validate links and citations if needed

URL Cleanup

Answers automatically clean up markdown links to remove .md extensions:

// Before cleanup [Compound](https://console.groq.com/docs/agentic-tooling/compound-beta.md) // After cleanup [Compound](https://console.groq.com/docs/agentic-tooling/compound-beta)

This happens through:

  1. System prompt: Instructs the LLM to avoid .md extensions
  2. Post-processing: cleanupMarkdownLinks() function removes any remaining .md extensions

To disable cleanup (not recommended):

// In your custom strategy, skip the cleanup: return { answer: llmResponse, // Don't call cleanupMarkdownLinks() // ... rest of result };

Testing

Test your strategy with various question types:

  • Factual: "What models does Groq support?"
  • How-to: "How do I authenticate with the API?"
  • Comparison: "What's the difference between model X and Y?"
  • Code examples: "Show me an example of streaming responses"
  • Complex: "How do I implement rate limiting with retries?"

Performance Benchmarks

Target performance for answer strategies:

  • Search: 50-500ms (depends on active search strategy)
  • Context prep: <50ms
  • LLM call: 500-3000ms (depends on model and response length)
  • Total: 1-4s for most queries

Strategies taking >5s should be optimized or marked as "detailed" mode.

FeaturesVersion controlCode intelligenceCLI
Use cases
TeamsAI agentsSlackGTM
DocsShowcaseTemplatesNewestTrendingAPI examplesNPM packages
PricingNewsletterBlogAboutCareers
We’re hiring!
Brandhi@val.townStatus
X (Twitter)
Discord community
GitHub discussions
YouTube channel
Bluesky
Open Source Pledge
Terms of usePrivacy policyAbuse contact
© 2025 Val Town, Inc.