• Townie
    AI
  • Blog
  • Docs
  • Pricing
  • We’re hiring!
Log inSign up
yawnxyz

yawnxyz

groq-docs

Public
Like
groq-docs
Home
Code
14
answer
9
data
search
12
testing
4
utils
1
.vtignore
AGENTS.md
README.md
deno.json
groq.ts
H
main.tsx
todo.md
urls.ts
utils.ts
Branches
1
Pull requests
Remixes
History
Environment variables
Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data – all from the browser, and deployed in milliseconds.
Sign up now
Code
/
answer
/
STRATEGY-COMPARISON.md
Code
/
answer
/
STRATEGY-COMPARISON.md
Search
…
Viewing readonly version of main branch: v66
View latest version
STRATEGY-COMPARISON.md

Answer Strategy Comparison

This document helps you choose the right answer strategy for your use case.

Current Strategies

1. llama-3.3-70b-default ✅ (Active)

Best for: General-purpose Q&A, complex questions, high-quality answers

Configuration:

  • Model: llama-3.3-70b-versatile
  • Max context pages: 5
  • Temperature: 0.3
  • Tokens per page: ~2000

Performance:

  • Search: 50-500ms (depends on search strategy)
  • Context prep: <50ms
  • LLM call: 500-3000ms
  • Total: ~1-3.5s

Quality: ⭐⭐⭐⭐⭐ (5/5) Speed: ⭐⭐⭐ (3/5) Cost: ⭐⭐⭐ (3/5)

Pros:

  • ✅ Excellent reasoning and comprehension
  • ✅ Handles complex multi-part questions
  • ✅ Good at code examples and technical details
  • ✅ Consistent, reliable answers

Cons:

  • ❌ Slower than 8B models
  • ❌ More expensive per token
  • ❌ May be overkill for simple questions

Use cases:

  • Complex how-to questions
  • Multi-step implementations
  • Comparing features/models
  • Debugging and troubleshooting
  • Code generation with explanations

Future Strategies (Templates)

2. llama-3.1-8b-fast (Not yet implemented)

Best for: Simple factual questions, quick lookups, high-volume traffic

Configuration:

  • Model: llama-3.1-8b-instant
  • Max context pages: 2-3
  • Temperature: 0.3
  • Tokens per page: ~1000

Expected Performance:

  • Search: 50-500ms
  • Context prep: <30ms
  • LLM call: 200-800ms
  • Total: ~0.3-1.3s

Quality: ⭐⭐⭐⭐ (4/5) Speed: ⭐⭐⭐⭐⭐ (5/5) Cost: ⭐⭐⭐⭐⭐ (5/5)

Best for:

  • "What models are available?"
  • "What is X?"
  • "How much does Y cost?"
  • Simple API lookups

3. mixtral-8x7b-extended (Not yet implemented)

Best for: Very complex questions requiring lots of context

Configuration:

  • Model: mixtral-8x7b-32768
  • Max context pages: 10
  • Temperature: 0.3
  • Tokens per page: ~3000

Expected Performance:

  • Search: 50-500ms
  • Context prep: ~100ms
  • LLM call: 1000-4000ms
  • Total: ~1.5-5s

Quality: ⭐⭐⭐⭐⭐ (5/5) Speed: ⭐⭐ (2/5) Cost: ⭐⭐ (2/5)

Best for:

  • "Compare all available models"
  • "How do I implement a complete chat system with X, Y, Z features?"
  • Questions requiring synthesis across many docs

4. llama-3.3-70b-code (Not yet implemented)

Best for: Code generation and technical implementation questions

Configuration:

  • Model: llama-3.3-70b-versatile
  • Max context pages: 5
  • Temperature: 0.2 (more deterministic for code)
  • Custom system prompt optimized for code

Expected Performance:

  • Similar to llama-3.3-70b-default
  • Total: ~1-3.5s

Quality: ⭐⭐⭐⭐⭐ (5/5) for code Speed: ⭐⭐⭐ (3/5) Cost: ⭐⭐⭐ (3/5)

Best for:

  • "Show me example code for X"
  • "How do I implement Y in Python/JavaScript?"
  • API usage examples
  • Debugging code issues

5. citation-mode (Not yet implemented)

Best for: Academic/research use, when you need to cite sources

Configuration:

  • Model: llama-3.3-70b-versatile
  • Max context pages: 5
  • Temperature: 0.3
  • Custom system prompt for citations
  • Post-processing to add inline citations

Expected Performance:

  • Similar to llama-3.3-70b-default
  • +50-100ms for citation processing
  • Total: ~1.5-4s

Quality: ⭐⭐⭐⭐⭐ (5/5) Speed: ⭐⭐⭐ (3/5) Cost: ⭐⭐⭐ (3/5)

Best for:

  • Documentation generation
  • Training materials
  • When you need exact source references
  • Verifying information

Comparison Matrix

StrategySpeedQualityCostContextBest For
llama-3.3-70b-default ✅MediumExcellentMedium5 pagesGeneral purpose, complex Q&A
llama-3.1-8b-fastFastGoodLow2-3 pagesSimple lookups, high volume
mixtral-8x7b-extendedSlowExcellentHigh10 pagesVery complex questions
llama-3.3-70b-codeMediumExcellentMedium5 pagesCode generation
citation-modeMediumExcellentMedium5 pagesResearch, documentation

Choosing a Strategy

Decision Tree

  1. Is the question simple and factual?

    • Yes → Use llama-3.1-8b-fast (when implemented)
    • No → Continue
  2. Does it require lots of documentation context?

    • Yes → Use mixtral-8x7b-extended (when implemented)
    • No → Continue
  3. Is it primarily about code/implementation?

    • Yes → Use llama-3.3-70b-code (when implemented)
    • No → Continue
  4. Do you need citations?

    • Yes → Use citation-mode (when implemented)
    • No → Use llama-3.3-70b-default ✅

By Question Type

Factual Questions ("What is X?")

  • 🥇 llama-3.1-8b-fast (fast + good enough)
  • 🥈 llama-3.3-70b-default (better quality)

How-To Questions ("How do I X?")

  • 🥇 llama-3.3-70b-default (balanced)
  • 🥈 llama-3.3-70b-code (if code-heavy)

Complex Questions ("How do I build X with Y and Z?")

  • 🥇 mixtral-8x7b-extended (more context)
  • 🥈 llama-3.3-70b-default (usually sufficient)

Code Examples ("Show me code for X")

  • 🥇 llama-3.3-70b-code (optimized)
  • 🥈 llama-3.3-70b-default (good enough)

Comparison Questions ("What's the difference between X and Y?")

  • 🥇 llama-3.3-70b-default (good reasoning)
  • 🥈 mixtral-8x7b-extended (if comparing many things)

By Performance Requirements

Need fastest response (<1s goal)

  • Use llama-3.1-8b-fast (when implemented)
  • Reduce maxContextPages to 2
  • Use fastest search strategy

Need best quality (no time constraint)

  • Use mixtral-8x7b-extended (when implemented)
  • Increase maxContextPages to 10
  • Raise minScore to 60+

Balanced (1-3s acceptable)

  • Use llama-3.3-70b-default ✅ (current default)
  • Default settings work well

By Traffic Patterns

High volume, many simple questions

  • Implement and use llama-3.1-8b-fast
  • Consider caching common answers
  • Use rate limiting

Low volume, complex questions

  • Use llama-3.3-70b-default or mixtral-8x7b-extended
  • Quality over speed

Mixed traffic

  • Route simple questions to fast strategy
  • Route complex questions to powerful strategy
  • Consider implementing multi-strategy endpoint

Cost Considerations

Approximate token costs per answer (based on Groq pricing):

llama-3.3-70b-default

  • Context: ~10k tokens (5 pages × 2k each)
  • Answer: ~500 tokens
  • Total: ~10.5k tokens per answer

llama-3.1-8b-fast (estimated)

  • Context: ~3k tokens (3 pages × 1k each)
  • Answer: ~300 tokens
  • Total: ~3.3k tokens per answer
  • ~3x cheaper than 70B

mixtral-8x7b-extended (estimated)

  • Context: ~30k tokens (10 pages × 3k each)
  • Answer: ~700 tokens
  • Total: ~30.7k tokens per answer
  • ~3x more expensive than default

Cost-saving strategies:

  1. Use smaller models for simple questions
  2. Reduce maxContextPages
  3. Increase minScore (fewer pages in context)
  4. Cache common questions/answers
  5. Implement tiered routing

Implementation Guide

Adding a New Strategy

  1. Copy template:

    cp answer/llama-3.3-70b-default.ts answer/my-strategy.ts
  2. Modify configuration:

    const DEFAULT_MODEL = "your-model"; const DEFAULT_MAX_CONTEXT_PAGES = 3; const DEFAULT_TEMPERATURE = 0.5;
  3. Update strategy metadata:

    export const answerStrategy: AnswerStrategy = { name: "my-strategy", description: "What makes it special", answer: async (query, options) => { /* ... */ } };
  4. Activate:

    // answer/index.ts import { answerStrategy } from "./my-strategy.ts";
  5. Test:

    curl "http://localhost:8000/answer/info" curl "http://localhost:8000/answer?q=test"

Testing Multiple Strategies

Create a comparison test script:

// testing/answer-comparison.ts import { questions } from "./questions.ts"; const strategies = [ "llama-3.3-70b-default", "llama-3.1-8b-fast", "mixtral-8x7b-extended" ]; for (const strategy of strategies) { for (const question of questions) { // Switch strategy // Run question // Collect metrics } } // Compare results

Monitoring and Optimization

Key Metrics to Track

  1. Response time breakdown:

    • Search duration
    • Context prep duration
    • LLM duration
    • Total duration
  2. Quality metrics:

    • User feedback/ratings
    • Answer correctness
    • Relevance of sources
  3. Cost metrics:

    • Tokens per answer
    • Cost per answer
    • Monthly spend
  4. Usage patterns:

    • Question types
    • Model distribution
    • Cache hit rates

Optimization Tips

  1. Speed:

    • Profile with enableTiming: true
    • Optimize slowest component
    • Consider caching
  2. Quality:

    • Test with real questions
    • Tune system prompts
    • Adjust context size
  3. Cost:

    • Route by question complexity
    • Cache common answers
    • Use smaller models when possible

Future Enhancements

Multi-Strategy Routing

Automatically select strategy based on question analysis:

async function smartRoute(question: string) { const complexity = analyzeComplexity(question); const type = classifyQuestion(question); if (complexity === "simple" && type === "factual") { return "llama-3.1-8b-fast"; } else if (type === "code") { return "llama-3.3-70b-code"; } else if (complexity === "complex") { return "mixtral-8x7b-extended"; } return "llama-3.3-70b-default"; }

Answer Caching

Cache answers for common questions:

const cachedAnswer = await cache.get(questionHash); if (cachedAnswer) return cachedAnswer; const answer = await generateAnswer(question); await cache.set(questionHash, answer, ttl);

Follow-up Questions

Support conversation context:

await answerQuestion(question, { conversationHistory: previousQA, maxContextPages: 3 // Reduced to fit history });

Resources

  • Groq Models: https://console.groq.com/docs/models
  • Groq Pricing: https://groq.com/pricing/
  • Model Benchmarks: https://console.groq.com/docs/benchmarks
FeaturesVersion controlCode intelligenceCLI
Use cases
TeamsAI agentsSlackGTM
DocsShowcaseTemplatesNewestTrendingAPI examplesNPM packages
PricingNewsletterBlogAboutCareers
We’re hiring!
Brandhi@val.townStatus
X (Twitter)
Discord community
GitHub discussions
YouTube channel
Bluesky
Open Source Pledge
Terms of usePrivacy policyAbuse contact
© 2025 Val Town, Inc.