This document helps you choose the best search strategy for your use case.
┌─ Need to run on Cloudflare Workers?
│ └─ YES → Use cloudflare-bge-cosine.ts
│
├─ Need 100% offline, no network calls?
│ └─ YES → Use transformers-local-onnx.ts (after downloading model)
│
├─ Want fastest setup with good performance?
│ └─ YES → Use transformers-cosine.ts (auto-downloads on first run)
│
├─ Need the best accuracy?
│ └─ YES → Use mixedbread-embeddings-cosine.ts or openai-cosine.ts
│
└─ Want managed search (no embeddings management)?
└─ YES → Use jigsawstack-orama.ts or mixedbread.ts
| Strategy | First Load | Cached Load | Query Time | Total (cached) | Network Required |
|---|---|---|---|---|---|
| transformers-local-onnx | ~150ms | ~50ms | ~10-30ms | ~60-80ms | ❌ No |
| transformers-cosine | ~3-5s | ~150ms | ~10-30ms | ~160-180ms | ✅ First run only |
| mixedbread-embeddings | N/A | N/A | ~50-100ms | ~50-100ms | ✅ Every query |
| openai-cosine | N/A | N/A | ~100-200ms | ~100-200ms | ✅ Every query |
| hf-inference-qwen3 | N/A | N/A | ~150-300ms | ~150-300ms | ✅ Every query |
| cloudflare-bge | N/A | N/A | ~50-150ms | ~50-150ms | ✅ Every query |
| jigsawstack-orama | N/A | N/A | ~550ms | ~550ms | ✅ Every query |
| Strategy | Cost | Free Tier | Notes |
|---|---|---|---|
| transformers-local-onnx | $0 | ∞ | 100% free, runs locally |
| transformers-cosine | $0 | ∞ | 100% free, runs locally |
| mixedbread-embeddings | $0-$ | Generous | Free tier: 150 req/min, 100M tokens/mo |
| openai-cosine | $$ | Limited | $0.0001/1K tokens (text-embedding-3-small) |
| hf-inference-qwen3 | $0 | Generous | Free tier: 1000 req/day |
| cloudflare-bge | $0 | Generous | Free tier: 10,000 req/day |
| jigsawstack-orama | $0-$ | Limited | Free tier: limited requests |
Based on MTEB (Massive Text Embedding Benchmark) scores:
| Strategy | Model | Dimensions | MTEB Score | Notes |
|---|---|---|---|---|
| transformers-local-onnx | all-MiniLM-L6-v2 | 384 | ~58 | Fast, good quality |
| transformers-cosine | all-MiniLM-L6-v2 | 384 | ~58 | Same as local |
| mixedbread-embeddings | mxbai-embed-large-v1 | 1024 | ~64 | Higher quality |
| openai-cosine | text-embedding-3-small | 1536 | ~62 | Reliable, tested |
| hf-inference-qwen3 | Qwen3-Embedding-8B | 768 | ~65 | Very high quality |
| cloudflare-bge | bge-large-en-v1.5 | 1024 | ~64 | Good quality |
Use: transformers-local-onnx.ts
Why:
Best for:
Use: transformers-cosine.ts
Why:
Best for:
Use: cloudflare-bge-cosine.ts
Why:
Best for:
Use: hf-inference-qwen3-cosine.ts or mixedbread-embeddings-cosine.ts
Why:
Best for:
Use: mixedbread.ts or jigsawstack-orama.ts
Why:
Best for:
Download the model:
cd search/models ./download-model.sh
Update search/index.ts:
// Before
// import { searchStrategy, generateEmbeddings } from "./openai-cosine.ts";
// After
import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";
Recalculate embeddings (if dimensions differ):
GET /cache/recalculate?force=true
transformers-cosine.ts (isolate has caching)transformers-local-onnx.ts (no persistent file system)transformers-local-onnx.ts (with deployment package)transformers-cosine.ts or API-basedtransformers-local-onnx.tsCOPY search/models/all-MiniLM-L6-v2 /app/search/models/all-MiniLM-L6-v2
transformers-local-onnx.ts (if you can handle cold starts)transformers-cosine.ts (auto-download)transformers-local-onnx.ts (if downloaded)Run the test harness to compare strategies on your infrastructure:
cd testing deno run --allow-read --allow-env --allow-net test-search.ts
This will show actual performance numbers for your specific setup.
| Criteria | Best Choice |
|---|---|
| Fastest | transformers-local-onnx |
| Easiest Setup | transformers-cosine |
| Most Accurate | hf-inference-qwen3 |
| Cheapest | transformers-local-onnx (free) |
| Best for Production | transformers-local-onnx |
| Best for Cloudflare | cloudflare-bge-cosine |
| Best for Val.town | transformers-cosine |
| Most Reliable | openai-cosine |
| Fully Managed | mixedbread or jigsawstack |
Default recommendation: transformers-local-onnx.ts
It offers the best combination of speed, cost (free), and reliability for most production use cases. The only downside is the initial setup (downloading model files), which takes a few minutes.
If you can't download models or need to deploy immediately, use transformers-cosine.ts as it auto-downloads on first run.