This document helps you choose the right answer strategy for your use case.
Best for: General-purpose Q&A, complex questions, high-quality answers
Configuration:
llama-3.3-70b-versatilePerformance:
Quality: ⭐⭐⭐⭐⭐ (5/5) Speed: ⭐⭐⭐ (3/5) Cost: ⭐⭐⭐ (3/5)
Pros:
Cons:
Use cases:
Best for: Simple factual questions, quick lookups, high-volume traffic
Configuration:
llama-3.1-8b-instantExpected Performance:
Quality: ⭐⭐⭐⭐ (4/5) Speed: ⭐⭐⭐⭐⭐ (5/5) Cost: ⭐⭐⭐⭐⭐ (5/5)
Best for:
Best for: Very complex questions requiring lots of context
Configuration:
mixtral-8x7b-32768Expected Performance:
Quality: ⭐⭐⭐⭐⭐ (5/5) Speed: ⭐⭐ (2/5) Cost: ⭐⭐ (2/5)
Best for:
Best for: Code generation and technical implementation questions
Configuration:
llama-3.3-70b-versatileExpected Performance:
Quality: ⭐⭐⭐⭐⭐ (5/5) for code Speed: ⭐⭐⭐ (3/5) Cost: ⭐⭐⭐ (3/5)
Best for:
Best for: Academic/research use, when you need to cite sources
Configuration:
llama-3.3-70b-versatileExpected Performance:
Quality: ⭐⭐⭐⭐⭐ (5/5) Speed: ⭐⭐⭐ (3/5) Cost: ⭐⭐⭐ (3/5)
Best for:
| Strategy | Speed | Quality | Cost | Context | Best For |
|---|---|---|---|---|---|
| llama-3.3-70b-default ✅ | Medium | Excellent | Medium | 5 pages | General purpose, complex Q&A |
| llama-3.1-8b-fast | Fast | Good | Low | 2-3 pages | Simple lookups, high volume |
| mixtral-8x7b-extended | Slow | Excellent | High | 10 pages | Very complex questions |
| llama-3.3-70b-code | Medium | Excellent | Medium | 5 pages | Code generation |
| citation-mode | Medium | Excellent | Medium | 5 pages | Research, documentation |
Is the question simple and factual?
llama-3.1-8b-fast (when implemented)Does it require lots of documentation context?
mixtral-8x7b-extended (when implemented)Is it primarily about code/implementation?
llama-3.3-70b-code (when implemented)Do you need citations?
citation-mode (when implemented)llama-3.3-70b-default ✅Factual Questions ("What is X?")
How-To Questions ("How do I X?")
Complex Questions ("How do I build X with Y and Z?")
Code Examples ("Show me code for X")
Comparison Questions ("What's the difference between X and Y?")
Need fastest response (<1s goal)
llama-3.1-8b-fast (when implemented)maxContextPages to 2Need best quality (no time constraint)
mixtral-8x7b-extended (when implemented)maxContextPages to 10minScore to 60+Balanced (1-3s acceptable)
llama-3.3-70b-default ✅ (current default)High volume, many simple questions
llama-3.1-8b-fastLow volume, complex questions
llama-3.3-70b-default or mixtral-8x7b-extendedMixed traffic
Approximate token costs per answer (based on Groq pricing):
maxContextPagesminScore (fewer pages in context)Copy template:
cp answer/llama-3.3-70b-default.ts answer/my-strategy.ts
Modify configuration:
const DEFAULT_MODEL = "your-model";
const DEFAULT_MAX_CONTEXT_PAGES = 3;
const DEFAULT_TEMPERATURE = 0.5;
Update strategy metadata:
export const answerStrategy: AnswerStrategy = {
name: "my-strategy",
description: "What makes it special",
answer: async (query, options) => { /* ... */ }
};
Activate:
// answer/index.ts
import { answerStrategy } from "./my-strategy.ts";
Test:
curl "http://localhost:8000/answer/info" curl "http://localhost:8000/answer?q=test"
Create a comparison test script:
// testing/answer-comparison.ts
import { questions } from "./questions.ts";
const strategies = [
"llama-3.3-70b-default",
"llama-3.1-8b-fast",
"mixtral-8x7b-extended"
];
for (const strategy of strategies) {
for (const question of questions) {
// Switch strategy
// Run question
// Collect metrics
}
}
// Compare results
Response time breakdown:
Quality metrics:
Cost metrics:
Usage patterns:
Speed:
enableTiming: trueQuality:
Cost:
Automatically select strategy based on question analysis:
async function smartRoute(question: string) {
const complexity = analyzeComplexity(question);
const type = classifyQuestion(question);
if (complexity === "simple" && type === "factual") {
return "llama-3.1-8b-fast";
} else if (type === "code") {
return "llama-3.3-70b-code";
} else if (complexity === "complex") {
return "mixtral-8x7b-extended";
}
return "llama-3.3-70b-default";
}
Cache answers for common questions:
const cachedAnswer = await cache.get(questionHash);
if (cachedAnswer) return cachedAnswer;
const answer = await generateAnswer(question);
await cache.set(questionHash, answer, ttl);
Support conversation context:
await answerQuestion(question, {
conversationHistory: previousQA,
maxContextPages: 3 // Reduced to fit history
});