Answer Strategies - Quick Start Guide

This guide will help you quickly get started with answer strategies for RAG-based question answering.

What Are Answer Strategies?

Answer strategies implement Retrieval-Augmented Generation (RAG) - they search documentation, retrieve relevant pages, and use an LLM to generate answers to user questions.

Quick Test

1. Run Tests Locally (Recommended)

# Run comprehensive test suite with timing breakdown
deno task answer

This will:

Load test questions from testing/questions.ts
Run each question through the active answer strategy
Show detailed timing breakdown (search, context prep, LLM call)
Display search results used for context
Show generated answers
Provide summary statistics

2. Test via API Endpoints

Start the Server

deno task serve

Test Single Questions

# Basic question
curl "http://localhost:8000/answer?q=How+do+I+use+the+Groq+API"

# With options
curl "http://localhost:8000/answer?q=What+models+are+available&maxContextPages=3"

# Get strategy info
curl "http://localhost:8000/answer/info"

# Run test suite via API
curl "http://localhost:8000/answer/test"

How It Works

Step-by-Step Process

User asks a question: "How do I authenticate with the Groq API?"

Search (50-500ms): The active search strategy finds relevant docs

→ Search query: "authenticate groq api"
→ Returns: Top 10 matching pages ranked by relevance

Retrieve (fast, from cache): Get full content for top 5 pages

→ Loads content from latest.json (local cache)
→ Each page limited to ~2000 tokens

Format (<50ms): Package docs as LLM context

→ Formats pages with titles, URLs, and content
→ Adds system prompt with instructions
→ Creates user prompt with question

Generate (500-3000ms): LLM generates answer

→ Calls Groq API with Llama 3.3 70B
→ Returns markdown-formatted answer
→ Includes citations and examples from docs

Understanding the Response

{
  "answer": "To authenticate with the Groq API...",
  "query": "How do I authenticate with the Groq API?",
  "searchResults": [
    {
      "path": "api/authentication",
      "url": "https://console.groq.com/docs/api/authentication",
      "title": "Authentication",
      "score": 92.5
    }
  ],
  "contextUsed": 5,
  "totalTokens": 8500,
  "metadata": {
    "strategy": "llama-3.3-70b-default",
    "model": "llama-3.3-70b-versatile",
    "temperature": 0.3,
    "searchResultsCount": 10,
    "timings": {
      "search": 45.2,
      "contextPrep": 5.1,
      "llm": 1250.3,
      "total": 1300.6
    }
  }
}

Key Fields

answer: The LLM-generated answer (markdown formatted)
query: Your original question
searchResults: Top pages used as context (with scores)
contextUsed: Number of pages actually sent to the LLM
totalTokens: Approximate tokens in the context
metadata.timings: Performance breakdown

Switching Strategies

Current Strategy

The active strategy is set in answer/index.ts:

// Default: Llama 3.3 70B with up to 5 doc pages
import { answerStrategy } from "./llama-3.3-70b-default.ts";

Available Strategies

Currently available:

✅ llama-3.3-70b-default: Balanced quality and speed

Coming soon (you can implement these!):

⚡ llama-3.1-8b-fast: Faster responses with smaller model
📚 mixtral-8x7b-extended: More context with larger model
💻 llama-3.3-70b-code: Optimized for code examples
🔗 citation-mode: Includes specific citations

Switching

Comment out current strategy in answer/index.ts
Uncomment the strategy you want to use
Restart the server

// answer/index.ts

// Comment out default
// import { answerStrategy } from "./llama-3.3-70b-default.ts";

// Use fast strategy instead
import { answerStrategy } from "./llama-3.1-8b-fast.ts";

Query Parameters

Required

q: The question to answer
```
?q=How+do+I+use+streaming
```

Optional

limit: Max search results (default: 10)
```
?limit=20
```
minScore: Minimum search score 0-100 (default: 0)
```
?minScore=50
```
maxContextPages: Pages to send to LLM (default: 5)
```
?maxContextPages=3
```
temperature: LLM temperature 0-1 (default: 0.3)
```
?temperature=0.5
```
model: Override the model (optional)
```
?model=llama-3.1-8b-instant
```

Example Combinations

# Fast mode: fewer pages, instant model
curl "http://localhost:8000/answer?q=What+is+Groq&maxContextPages=2&model=llama-3.1-8b-instant"

# High quality: more pages, higher threshold
curl "http://localhost:8000/answer?q=How+to+implement+retry+logic&limit=20&minScore=60&maxContextPages=7"

# Creative mode: higher temperature
curl "http://localhost:8000/answer?q=Ideas+for+using+Groq+API&temperature=0.8"

Common Question Types

Factual Questions

Good for: Quick facts, definitions, model names

curl "http://localhost:8000/answer?q=What+models+does+Groq+support"

Best settings: Default (5 pages, temp 0.3)

How-To Questions

Good for: Implementation guides, step-by-step instructions

curl "http://localhost:8000/answer?q=How+do+I+implement+streaming"

Best settings: More pages (6-8), low temp (0.2-0.3)

Code Examples

Good for: Requesting code snippets

curl "http://localhost:8000/answer?q=Show+me+a+Python+example+of+calling+the+API"

Best settings: Code-focused strategy (when available), temp 0.3

Comparison Questions

Good for: Comparing models, features, approaches

curl "http://localhost:8000/answer?q=Difference+between+Llama+3.1+and+3.3"

Best settings: More pages (7-10), temp 0.3

Complex Questions

Good for: Multi-part questions requiring reasoning

curl "http://localhost:8000/answer?q=How+to+build+a+chatbot+with+streaming+and+conversation+history"

Best settings: More pages (8-10), 70B+ model, temp 0.4

Troubleshooting

No Answer Generated

Problem: Returns error or empty answer

Possible causes:

No search results found
GROQ_API_KEY not set
Rate limit exceeded

Solution:

# Check if search works
curl "http://localhost:8000/search?q=your+question"

# Verify API key
echo $GROQ_API_KEY

# Check strategy info
curl "http://localhost:8000/answer/info"

Poor Answer Quality

Problem: Answer is vague, incorrect, or unhelpful

Possible causes:

Search results not relevant
Not enough context (too few pages)
Temperature too high

Solutions:

# Check search results first
curl "http://localhost:8000/search?q=your+question&limit=10"

# Increase context
curl "http://localhost:8000/answer?q=your+question&maxContextPages=8"

# Lower temperature
curl "http://localhost:8000/answer?q=your+question&temperature=0.2"

# Increase search quality threshold
curl "http://localhost:8000/answer?q=your+question&minScore=60"

Slow Responses

Problem: Takes >5 seconds to respond

Possible causes:

Slow search strategy
Too many pages in context
Large model

Solutions:

# Use fewer pages
curl "http://localhost:8000/answer?q=your+question&maxContextPages=3"

# Switch to faster model
curl "http://localhost:8000/answer?q=your+question&model=llama-3.1-8b-instant"

# Check timing breakdown in response metadata

Creating Your Own Strategy

1. Copy the Template

cp answer/llama-3.3-70b-default.ts answer/my-custom-strategy.ts

2. Modify Configuration

// Change these constants
const DEFAULT_MODEL = "your-preferred-model";
const DEFAULT_MAX_CONTEXT_PAGES = 3; // Adjust as needed
const DEFAULT_TEMPERATURE = 0.5;
const MAX_TOKENS_PER_PAGE = 1000; // Adjust as needed

3. Customize System Prompt (Optional)

import { DEFAULT_SYSTEM_PROMPT } from "./utils.ts";

// Or create your own:
const CUSTOM_SYSTEM_PROMPT = `You are a helpful assistant that...`;

4. Update Strategy Info

export const answerStrategy: AnswerStrategy = {
  name: "my-custom-strategy",
  description: "What makes your strategy special",
  answer: async (query: string, options: AnswerOptions = {}) => {
    // Your implementation
  },
};

5. Activate Your Strategy

Edit answer/index.ts:

import { answerStrategy } from "./my-custom-strategy.ts";

6. Test It

curl "http://localhost:8000/answer/info"  # Verify active strategy
curl "http://localhost:8000/answer?q=test+question"

Performance Tips

Optimize for Speed

Use faster models (8B instead of 70B)
Reduce context pages (2-3 instead of 5)
Use faster search strategy
Lower token limits per page

Optimize for Quality

Use larger models (70B+)
Increase context pages (7-10)
Use higher search score thresholds
Include more search results to choose from

Balance Both

Start with default (5 pages, 70B, temp 0.3)
Monitor timing breakdown in responses
Adjust based on question complexity
Cache common questions (future enhancement)

Next Steps

✅ Test the default strategy
✅ Try different query parameters
✅ Check the README.md for more details
⚡ Create a fast strategy for simple questions
📚 Create an extended strategy for complex questions
🔧 Customize system prompts for your use case
📊 Build a testing harness (like /search/test)

Resources

Search Strategies: See /search/ for how document retrieval works
Groq Models: https://console.groq.com/docs/models
API Reference: https://console.groq.com/docs/api-reference

Questions?

Check out:

answer/README.md - Detailed documentation
answer/types.ts - Type definitions
answer/utils.ts - Helper functions
main.tsx - How endpoints are implemented