• Townie
    AI
  • Blog
  • Docs
  • Pricing
  • We’re hiring!
Log inSign up
yawnxyz

yawnxyz

groq-docs

Public
Like
groq-docs
Home
Code
14
answer
8
data
search
12
testing
4
utils
1
.vtignore
AGENTS.md
README.md
deno.json
groq.ts
H
main.tsx
todo.md
urls.ts
utils.ts
Branches
1
Pull requests
Remixes
History
Environment variables
Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data – all from the browser, and deployed in milliseconds.
Sign up now
Code
/
search
/
STRATEGY-COMPARISON.md
Code
/
search
/
STRATEGY-COMPARISON.md
Search
…
Viewing readonly version of main branch: v64
View latest version
STRATEGY-COMPARISON.md

Search Strategy Comparison

This document helps you choose the best search strategy for your use case.

Quick Decision Tree

┌─ Need to run on Cloudflare Workers?
│  └─ YES → Use cloudflare-bge-cosine.ts
│
├─ Need 100% offline, no network calls?
│  └─ YES → Use transformers-local-onnx.ts (after downloading model)
│
├─ Want fastest setup with good performance?
│  └─ YES → Use transformers-cosine.ts (auto-downloads on first run)
│
├─ Need the best accuracy?
│  └─ YES → Use mixedbread-embeddings-cosine.ts or openai-cosine.ts
│
└─ Want managed search (no embeddings management)?
   └─ YES → Use jigsawstack-orama.ts or mixedbread.ts

Detailed Comparison

Performance Metrics

StrategyFirst LoadCached LoadQuery TimeTotal (cached)Network Required
transformers-local-onnx~150ms~50ms~10-30ms~60-80ms❌ No
transformers-cosine~3-5s~150ms~10-30ms~160-180ms✅ First run only
mixedbread-embeddingsN/AN/A~50-100ms~50-100ms✅ Every query
openai-cosineN/AN/A~100-200ms~100-200ms✅ Every query
hf-inference-qwen3N/AN/A~150-300ms~150-300ms✅ Every query
cloudflare-bgeN/AN/A~50-150ms~50-150ms✅ Every query
jigsawstack-oramaN/AN/A~550ms~550ms✅ Every query

Cost Comparison

StrategyCostFree TierNotes
transformers-local-onnx$0∞100% free, runs locally
transformers-cosine$0∞100% free, runs locally
mixedbread-embeddings$0-$GenerousFree tier: 150 req/min, 100M tokens/mo
openai-cosine$$Limited$0.0001/1K tokens (text-embedding-3-small)
hf-inference-qwen3$0GenerousFree tier: 1000 req/day
cloudflare-bge$0GenerousFree tier: 10,000 req/day
jigsawstack-orama$0-$LimitedFree tier: limited requests

Accuracy Comparison

Based on MTEB (Massive Text Embedding Benchmark) scores:

StrategyModelDimensionsMTEB ScoreNotes
transformers-local-onnxall-MiniLM-L6-v2384~58Fast, good quality
transformers-cosineall-MiniLM-L6-v2384~58Same as local
mixedbread-embeddingsmxbai-embed-large-v11024~64Higher quality
openai-cosinetext-embedding-3-small1536~62Reliable, tested
hf-inference-qwen3Qwen3-Embedding-8B768~65Very high quality
cloudflare-bgebge-large-en-v1.51024~64Good quality

Use Case Recommendations

🏆 Production Deployment (Recommended)

Use: transformers-local-onnx.ts

Why:

  • Predictable performance (no network variance)
  • No API costs
  • No rate limits
  • Works offline
  • Fast after initial load

Best for:

  • Deno Deploy
  • Render / Railway / Fly.io
  • Docker containers
  • Any environment with file system access

🚀 Quick Start / Prototyping

Use: transformers-cosine.ts

Why:

  • Zero setup (auto-downloads model)
  • No API keys needed
  • Good performance after first run
  • Easy to switch to local later

Best for:

  • Development
  • Testing
  • Quick demos
  • When you don't want to download models manually

☁️ Cloudflare Workers

Use: cloudflare-bge-cosine.ts

Why:

  • Workers have 1MB code size limit (can't fit local models)
  • Cloudflare AI is optimized for Workers
  • Free tier is generous
  • Low latency

Best for:

  • Cloudflare Workers/Pages
  • Edge deployment
  • Global low-latency requirements

🎯 Maximum Accuracy

Use: hf-inference-qwen3-cosine.ts or mixedbread-embeddings-cosine.ts

Why:

  • Higher MTEB scores
  • Better semantic understanding
  • More dimensions (768-1024 vs 384)

Best for:

  • When accuracy matters more than speed
  • Complex semantic queries
  • Production with budget for API calls

📦 Managed Solution

Use: mixedbread.ts or jigsawstack-orama.ts

Why:

  • No embedding management needed
  • Handles storage, search, and embeddings
  • Less code to maintain

Best for:

  • When you want a managed solution
  • Don't want to store embeddings yourself
  • Prefer APIs over local computation

Migration Path

From API-based to Local

  1. Download the model:

    cd search/models ./download-model.sh
  2. Update search/index.ts:

    // Before // import { searchStrategy, generateEmbeddings } from "./openai-cosine.ts"; // After import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";
  3. Recalculate embeddings (if dimensions differ):

    GET /cache/recalculate?force=true

From transformers-cosine to transformers-local-onnx

  1. Download the model (same as above)
  2. Update the import
  3. No recalculation needed (same model, same dimensions)

Environment-Specific Notes

Val.town

  • Recommended: transformers-cosine.ts (isolate has caching)
  • Alternative: Any API-based strategy
  • Avoid: transformers-local-onnx.ts (no persistent file system)

Deno Deploy

  • Recommended: transformers-local-onnx.ts (with deployment package)
  • Alternative: transformers-cosine.ts or API-based
  • Note: Include model files in deployment

Docker

  • Recommended: transformers-local-onnx.ts
  • Note: Include model files in image
  • Example:
    COPY search/models/all-MiniLM-L6-v2 /app/search/models/all-MiniLM-L6-v2

Serverless (AWS Lambda, Google Cloud Functions)

  • Recommended: API-based strategies (less cold start time)
  • Alternative: transformers-local-onnx.ts (if you can handle cold starts)
  • Note: Local models increase deployment package size

Local Development

  • Recommended: transformers-cosine.ts (auto-download)
  • Alternative: transformers-local-onnx.ts (if downloaded)
  • Note: Both work great for development

Benchmarking Your Setup

Run the test harness to compare strategies on your infrastructure:

cd testing deno run --allow-read --allow-env --allow-net test-search.ts

This will show actual performance numbers for your specific setup.

Summary Table

CriteriaBest Choice
Fastesttransformers-local-onnx
Easiest Setuptransformers-cosine
Most Accuratehf-inference-qwen3
Cheapesttransformers-local-onnx (free)
Best for Productiontransformers-local-onnx
Best for Cloudflarecloudflare-bge-cosine
Best for Val.towntransformers-cosine
Most Reliableopenai-cosine
Fully Managedmixedbread or jigsawstack

Still Unsure?

Default recommendation: transformers-local-onnx.ts

It offers the best combination of speed, cost (free), and reliability for most production use cases. The only downside is the initial setup (downloading model files), which takes a few minutes.

If you can't download models or need to deploy immediately, use transformers-cosine.ts as it auto-downloads on first run.

FeaturesVersion controlCode intelligenceCLI
Use cases
TeamsAI agentsSlackGTM
DocsShowcaseTemplatesNewestTrendingAPI examplesNPM packages
PricingNewsletterBlogAboutCareers
We’re hiring!
Brandhi@val.townStatus
X (Twitter)
Discord community
GitHub discussions
YouTube channel
Bluesky
Open Source Pledge
Terms of usePrivacy policyAbuse contact
© 2025 Val Town, Inc.