This guide will help you quickly get started with answer strategies for RAG-based question answering.
Answer strategies implement Retrieval-Augmented Generation (RAG) - they search documentation, retrieve relevant pages, and use an LLM to generate answers to user questions.
# Run comprehensive test suite with timing breakdown deno task answer
This will:
testing/questions.tsdeno task serve
# Basic question curl "http://localhost:8000/answer?q=How+do+I+use+the+Groq+API" # With options curl "http://localhost:8000/answer?q=What+models+are+available&maxContextPages=3" # Get strategy info curl "http://localhost:8000/answer/info" # Run test suite via API curl "http://localhost:8000/answer/test"
User asks a question: "How do I authenticate with the Groq API?"
Search (50-500ms): The active search strategy finds relevant docs
→ Search query: "authenticate groq api"
→ Returns: Top 10 matching pages ranked by relevance
Retrieve (fast, from cache): Get full content for top 5 pages
→ Loads content from latest.json (local cache)
→ Each page limited to ~2000 tokens
Format (<50ms): Package docs as LLM context
→ Formats pages with titles, URLs, and content
→ Adds system prompt with instructions
→ Creates user prompt with question
Generate (500-3000ms): LLM generates answer
→ Calls Groq API with Llama 3.3 70B
→ Returns markdown-formatted answer
→ Includes citations and examples from docs
{ "answer": "To authenticate with the Groq API...", "query": "How do I authenticate with the Groq API?", "searchResults": [ { "path": "api/authentication", "url": "https://console.groq.com/docs/api/authentication", "title": "Authentication", "score": 92.5 } ], "contextUsed": 5, "totalTokens": 8500, "metadata": { "strategy": "llama-3.3-70b-default", "model": "llama-3.3-70b-versatile", "temperature": 0.3, "searchResultsCount": 10, "timings": { "search": 45.2, "contextPrep": 5.1, "llm": 1250.3, "total": 1300.6 } } }
The active strategy is set in answer/index.ts:
// Default: Llama 3.3 70B with up to 5 doc pages
import { answerStrategy } from "./llama-3.3-70b-default.ts";
Currently available:
Coming soon (you can implement these!):
answer/index.ts// answer/index.ts
// Comment out default
// import { answerStrategy } from "./llama-3.3-70b-default.ts";
// Use fast strategy instead
import { answerStrategy } from "./llama-3.1-8b-fast.ts";
?q=How+do+I+use+streaming
limit: Max search results (default: 10)
?limit=20
minScore: Minimum search score 0-100 (default: 0)
?minScore=50
maxContextPages: Pages to send to LLM (default: 5)
?maxContextPages=3
temperature: LLM temperature 0-1 (default: 0.3)
?temperature=0.5
model: Override the model (optional)
?model=llama-3.1-8b-instant
# Fast mode: fewer pages, instant model curl "http://localhost:8000/answer?q=What+is+Groq&maxContextPages=2&model=llama-3.1-8b-instant" # High quality: more pages, higher threshold curl "http://localhost:8000/answer?q=How+to+implement+retry+logic&limit=20&minScore=60&maxContextPages=7" # Creative mode: higher temperature curl "http://localhost:8000/answer?q=Ideas+for+using+Groq+API&temperature=0.8"
Good for: Quick facts, definitions, model names
curl "http://localhost:8000/answer?q=What+models+does+Groq+support"
Best settings: Default (5 pages, temp 0.3)
Good for: Implementation guides, step-by-step instructions
curl "http://localhost:8000/answer?q=How+do+I+implement+streaming"
Best settings: More pages (6-8), low temp (0.2-0.3)
Good for: Requesting code snippets
curl "http://localhost:8000/answer?q=Show+me+a+Python+example+of+calling+the+API"
Best settings: Code-focused strategy (when available), temp 0.3
Good for: Comparing models, features, approaches
curl "http://localhost:8000/answer?q=Difference+between+Llama+3.1+and+3.3"
Best settings: More pages (7-10), temp 0.3
Good for: Multi-part questions requiring reasoning
curl "http://localhost:8000/answer?q=How+to+build+a+chatbot+with+streaming+and+conversation+history"
Best settings: More pages (8-10), 70B+ model, temp 0.4
Problem: Returns error or empty answer
Possible causes:
Solution:
# Check if search works curl "http://localhost:8000/search?q=your+question" # Verify API key echo $GROQ_API_KEY # Check strategy info curl "http://localhost:8000/answer/info"
Problem: Answer is vague, incorrect, or unhelpful
Possible causes:
Solutions:
# Check search results first curl "http://localhost:8000/search?q=your+question&limit=10" # Increase context curl "http://localhost:8000/answer?q=your+question&maxContextPages=8" # Lower temperature curl "http://localhost:8000/answer?q=your+question&temperature=0.2" # Increase search quality threshold curl "http://localhost:8000/answer?q=your+question&minScore=60"
Problem: Takes >5 seconds to respond
Possible causes:
Solutions:
# Use fewer pages curl "http://localhost:8000/answer?q=your+question&maxContextPages=3" # Switch to faster model curl "http://localhost:8000/answer?q=your+question&model=llama-3.1-8b-instant" # Check timing breakdown in response metadata
cp answer/llama-3.3-70b-default.ts answer/my-custom-strategy.ts
// Change these constants
const DEFAULT_MODEL = "your-preferred-model";
const DEFAULT_MAX_CONTEXT_PAGES = 3; // Adjust as needed
const DEFAULT_TEMPERATURE = 0.5;
const MAX_TOKENS_PER_PAGE = 1000; // Adjust as needed
import { DEFAULT_SYSTEM_PROMPT } from "./utils.ts";
// Or create your own:
const CUSTOM_SYSTEM_PROMPT = `You are a helpful assistant that...`;
export const answerStrategy: AnswerStrategy = {
name: "my-custom-strategy",
description: "What makes your strategy special",
answer: async (query: string, options: AnswerOptions = {}) => {
// Your implementation
},
};
Edit answer/index.ts:
import { answerStrategy } from "./my-custom-strategy.ts";
curl "http://localhost:8000/answer/info" # Verify active strategy curl "http://localhost:8000/answer?q=test+question"
/search/test)/search/ for how document retrieval worksCheck out:
answer/README.md - Detailed documentationanswer/types.ts - Type definitionsanswer/utils.ts - Helper functionsmain.tsx - How endpoints are implemented