Answer Strategies

Answer strategies implement Retrieval-Augmented Generation (RAG) to answer user questions using documentation search and LLM inference.

How It Works

Each answer strategy follows this general pattern:

Search: Use the active search strategy to find relevant documentation pages
Retrieve: Get full content from the top N search results
Format: Format the documentation as context for the LLM
Generate: Call the Groq LLM with the context and user question to generate an answer

Active Strategy

The active strategy is selected in answer/index.ts by uncommenting the desired import.

Current Active Strategy

llama-3.3-70b-default - Default strategy using Llama 3.3 70B

Uses up to 5 documentation pages in context
Truncates each page to ~2000 tokens to manage context window
Temperature: 0.3 (relatively focused/deterministic)
Model: llama-3.3-70b-versatile

Available Strategies

1. llama-3.3-70b-default.ts

Best for: General purpose Q&A with good balance of quality and speed

Model: llama-3.3-70b-versatile
Context: Up to 5 pages, 2000 tokens each
Temperature: 0.3 (default)
Performance: ~1-3s depending on search and LLM response time

Configuration options:

{
  limit: 10,              // Search results to consider
  minScore: 0,            // Minimum search score
  maxContextPages: 5,     // Pages to include in context
  temperature: 0.3,       // LLM temperature
  model: "llama-3.3-70b-versatile"  // Override model
}

Creating New Strategies

To create a new answer strategy:

Create a new file in /answer/ (e.g., llama-3.1-8b-fast.ts)
Implement the AnswerStrategy interface:

import type { AnswerStrategy, AnswerResult, AnswerOptions } from "./types.ts";

export const answerStrategy: AnswerStrategy = {
  name: "your-strategy-name",
  description: "Brief description of your strategy",
  
  answer: async (query: string, options: AnswerOptions = {}): Promise<AnswerResult> => {
    // Your implementation here
    // 1. Search for relevant pages
    // 2. Get full content
    // 3. Format as context
    // 4. Call LLM
    // 5. Return AnswerResult
  },
};

Add the import to answer/index.ts and comment out other strategies
Test your strategy

Strategy Ideas

Here are some ideas for additional strategies you might want to implement:

llama-3.1-8b-fast: Faster responses with smaller model, fewer pages
mixtral-8x7b-extended: More context pages with Mixtral's larger context window
llama-3.3-70b-code-focused: Specialized prompts for code examples and API usage
multi-step-reasoning: Break down complex questions into sub-questions
citation-mode: Include specific citations/references in the answer
conversational: Support follow-up questions with conversation history

API Usage

Basic Answer Request

GET /answer?q=How+do+I+use+the+Groq+API

With Options

GET /answer?q=What+models+are+available&maxContextPages=3&temperature=0.5

Get Strategy Info

GET /answer/info

Response Format

{
  "answer": "markdown formatted answer",
  "query": "user's original question",
  "searchResults": [
    {
      "path": "api/reference",
      "url": "https://...",
      "title": "API Reference",
      "score": 95.2
    }
  ],
  "contextUsed": 5,
  "totalTokens": 8500,
  "metadata": {
    "strategy": "llama-3.3-70b-default",
    "model": "llama-3.3-70b-versatile",
    "temperature": 0.3,
    "searchResultsCount": 10,
    "timings": {
      "search": 45.2,
      "contextPrep": 5.1,
      "llm": 1250.3,
      "total": 1300.6
    }
  }
}

Tuning Your Strategy

Context Size

More pages = more comprehensive but slower and more expensive
Fewer pages = faster but might miss relevant info
Recommended: 3-5 pages for most questions

Token Limits

Per page: 1000-3000 tokens is a good range
Total context: Stay under 32k tokens for most models
Llama 3.3 70B supports up to 128k context, but quality may degrade

Temperature

0.0-0.3: Focused, deterministic answers (good for facts)
0.4-0.7: Balanced creativity and accuracy
0.8-1.0: More creative but less consistent (avoid for docs Q&A)

Search Configuration

Higher limit: More pages to choose from (10-20 recommended)
Higher minScore: Only use highly relevant pages (50-70 threshold)
Balance between precision (fewer, better pages) and recall (more pages, might miss relevant info)

Best Practices

Match model to task:
- Simple factual questions: Smaller/faster models
- Complex reasoning: Larger models (70B+)
- Code generation: Models fine-tuned for code
System prompts matter:
- Be specific about the documentation domain (Groq API)
- Set expectations for answer format (markdown, citations, etc.)
- Provide guidelines for handling missing information
- Include URL formatting instructions (e.g., "remove .md extensions")
Manage context window:
- Don't just dump all pages - be selective
- Truncate long pages intelligently
- Consider summarizing very long pages first
Error handling:
- Gracefully handle search failures
- Provide fallback responses when LLM fails
- Include relevant metadata for debugging
Post-processing:
- Use cleanupMarkdownLinks() to remove .md extensions from generated links
- Add custom post-processing for your specific needs
- Validate links and citations if needed

URL Cleanup

Answers automatically clean up markdown links to remove .md extensions:

// Before cleanup
[Compound](https://console.groq.com/docs/agentic-tooling/compound-beta.md)

// After cleanup  
[Compound](https://console.groq.com/docs/agentic-tooling/compound-beta)

This happens through:

System prompt: Instructs the LLM to avoid .md extensions
Post-processing: cleanupMarkdownLinks() function removes any remaining .md extensions

To disable cleanup (not recommended):

// In your custom strategy, skip the cleanup:
return {
  answer: llmResponse, // Don't call cleanupMarkdownLinks()
  // ... rest of result
};

Testing

Test your strategy with various question types:

Factual: "What models does Groq support?"
How-to: "How do I authenticate with the API?"
Comparison: "What's the difference between model X and Y?"
Code examples: "Show me an example of streaming responses"
Complex: "How do I implement rate limiting with retries?"

Performance Benchmarks

Target performance for answer strategies:

Search: 50-500ms (depends on active search strategy)
Context prep: <50ms
LLM call: 500-3000ms (depends on model and response length)
Total: 1-4s for most queries

Strategies taking >5s should be optimized or marked as "detailed" mode.

yawnxyz

groq-docs