This guide will help you quickly get started with answer strategies for RAG-based question answering.
Answer strategies implement Retrieval-Augmented Generation (RAG) - they search documentation, retrieve relevant pages, and use an LLM to generate answers to user questions.
# Run comprehensive test suite with timing breakdown deno task answer
This will:
- Load test questions from
testing/questions.ts - Run each question through the active answer strategy
- Show detailed timing breakdown (search, context prep, LLM call)
- Display search results used for context
- Show generated answers
- Provide summary statistics
deno task serve
# Basic question curl "http://localhost:8000/answer?q=How+do+I+use+the+Groq+API" # With options curl "http://localhost:8000/answer?q=What+models+are+available&maxContextPages=3" # Get strategy info curl "http://localhost:8000/answer/info" # Run test suite via API curl "http://localhost:8000/answer/test"
-
User asks a question: "How do I authenticate with the Groq API?"
-
Search (50-500ms): The active search strategy finds relevant docs
→ Search query: "authenticate groq api" → Returns: Top 10 matching pages ranked by relevance -
Retrieve (fast, from cache): Get full content for top 5 pages
→ Loads content from latest.json (local cache) → Each page limited to ~2000 tokens -
Format (<50ms): Package docs as LLM context
→ Formats pages with titles, URLs, and content → Adds system prompt with instructions → Creates user prompt with question -
Generate (500-3000ms): LLM generates answer
→ Calls Groq API with Llama 3.3 70B → Returns markdown-formatted answer → Includes citations and examples from docs
{ "answer": "To authenticate with the Groq API...", "query": "How do I authenticate with the Groq API?", "searchResults": [ { "path": "api/authentication", "url": "https://console.groq.com/docs/api/authentication", "title": "Authentication", "score": 92.5 } ], "contextUsed": 5, "totalTokens": 8500, "metadata": { "strategy": "llama-3.3-70b-default", "model": "llama-3.3-70b-versatile", "temperature": 0.3, "searchResultsCount": 10, "timings": { "search": 45.2, "contextPrep": 5.1, "llm": 1250.3, "total": 1300.6 } } }
- answer: The LLM-generated answer (markdown formatted)
- query: Your original question
- searchResults: Top pages used as context (with scores)
- contextUsed: Number of pages actually sent to the LLM
- totalTokens: Approximate tokens in the context
- metadata.timings: Performance breakdown
The active strategy is set in answer/index.ts:
// Default: Llama 3.3 70B with up to 5 doc pages
import { answerStrategy } from "./llama-3.3-70b-default.ts";
Currently available:
- ✅ llama-3.3-70b-default: Balanced quality and speed
Coming soon (you can implement these!):
- ⚡ llama-3.1-8b-fast: Faster responses with smaller model
- 📚 mixtral-8x7b-extended: More context with larger model
- 💻 llama-3.3-70b-code: Optimized for code examples
- 🔗 citation-mode: Includes specific citations
- Comment out current strategy in
answer/index.ts - Uncomment the strategy you want to use
- Restart the server
// answer/index.ts
// Comment out default
// import { answerStrategy } from "./llama-3.3-70b-default.ts";
// Use fast strategy instead
import { answerStrategy } from "./llama-3.1-8b-fast.ts";
- q: The question to answer
?q=How+do+I+use+streaming
-
limit: Max search results (default: 10)
?limit=20 -
minScore: Minimum search score 0-100 (default: 0)
?minScore=50 -
maxContextPages: Pages to send to LLM (default: 5)
?maxContextPages=3 -
temperature: LLM temperature 0-1 (default: 0.3)
?temperature=0.5 -
model: Override the model (optional)
?model=llama-3.1-8b-instant
# Fast mode: fewer pages, instant model curl "http://localhost:8000/answer?q=What+is+Groq&maxContextPages=2&model=llama-3.1-8b-instant" # High quality: more pages, higher threshold curl "http://localhost:8000/answer?q=How+to+implement+retry+logic&limit=20&minScore=60&maxContextPages=7" # Creative mode: higher temperature curl "http://localhost:8000/answer?q=Ideas+for+using+Groq+API&temperature=0.8"
Good for: Quick facts, definitions, model names
curl "http://localhost:8000/answer?q=What+models+does+Groq+support"
Best settings: Default (5 pages, temp 0.3)
Good for: Implementation guides, step-by-step instructions
curl "http://localhost:8000/answer?q=How+do+I+implement+streaming"
Best settings: More pages (6-8), low temp (0.2-0.3)
Good for: Requesting code snippets
curl "http://localhost:8000/answer?q=Show+me+a+Python+example+of+calling+the+API"
Best settings: Code-focused strategy (when available), temp 0.3
Good for: Comparing models, features, approaches
curl "http://localhost:8000/answer?q=Difference+between+Llama+3.1+and+3.3"
Best settings: More pages (7-10), temp 0.3
Good for: Multi-part questions requiring reasoning
curl "http://localhost:8000/answer?q=How+to+build+a+chatbot+with+streaming+and+conversation+history"
Best settings: More pages (8-10), 70B+ model, temp 0.4
Problem: Returns error or empty answer
Possible causes:
- No search results found
- GROQ_API_KEY not set
- Rate limit exceeded
Solution:
# Check if search works curl "http://localhost:8000/search?q=your+question" # Verify API key echo $GROQ_API_KEY # Check strategy info curl "http://localhost:8000/answer/info"
Problem: Answer is vague, incorrect, or unhelpful
Possible causes:
- Search results not relevant
- Not enough context (too few pages)
- Temperature too high
Solutions:
# Check search results first curl "http://localhost:8000/search?q=your+question&limit=10" # Increase context curl "http://localhost:8000/answer?q=your+question&maxContextPages=8" # Lower temperature curl "http://localhost:8000/answer?q=your+question&temperature=0.2" # Increase search quality threshold curl "http://localhost:8000/answer?q=your+question&minScore=60"
Problem: Takes >5 seconds to respond
Possible causes:
- Slow search strategy
- Too many pages in context
- Large model
Solutions:
# Use fewer pages curl "http://localhost:8000/answer?q=your+question&maxContextPages=3" # Switch to faster model curl "http://localhost:8000/answer?q=your+question&model=llama-3.1-8b-instant" # Check timing breakdown in response metadata
cp answer/llama-3.3-70b-default.ts answer/my-custom-strategy.ts
// Change these constants
const DEFAULT_MODEL = "your-preferred-model";
const DEFAULT_MAX_CONTEXT_PAGES = 3; // Adjust as needed
const DEFAULT_TEMPERATURE = 0.5;
const MAX_TOKENS_PER_PAGE = 1000; // Adjust as needed
import { DEFAULT_SYSTEM_PROMPT } from "./utils.ts";
// Or create your own:
const CUSTOM_SYSTEM_PROMPT = `You are a helpful assistant that...`;
export const answerStrategy: AnswerStrategy = {
name: "my-custom-strategy",
description: "What makes your strategy special",
answer: async (query: string, options: AnswerOptions = {}) => {
// Your implementation
},
};
Edit answer/index.ts:
import { answerStrategy } from "./my-custom-strategy.ts";
curl "http://localhost:8000/answer/info" # Verify active strategy curl "http://localhost:8000/answer?q=test+question"
- Use faster models (8B instead of 70B)
- Reduce context pages (2-3 instead of 5)
- Use faster search strategy
- Lower token limits per page
- Use larger models (70B+)
- Increase context pages (7-10)
- Use higher search score thresholds
- Include more search results to choose from
- Start with default (5 pages, 70B, temp 0.3)
- Monitor timing breakdown in responses
- Adjust based on question complexity
- Cache common questions (future enhancement)
- ✅ Test the default strategy
- ✅ Try different query parameters
- ✅ Check the README.md for more details
- ⚡ Create a fast strategy for simple questions
- 📚 Create an extended strategy for complex questions
- 🔧 Customize system prompts for your use case
- 📊 Build a testing harness (like
/search/test)
- Search Strategies: See
/search/for how document retrieval works - Groq Models: https://console.groq.com/docs/models
- API Reference: https://console.groq.com/docs/api-reference
Check out:
answer/README.md- Detailed documentationanswer/types.ts- Type definitionsanswer/utils.ts- Helper functionsmain.tsx- How endpoints are implemented
