This document helps you choose the best search strategy for your use case.
┌─ Need to run on Cloudflare Workers?
│ └─ YES → Use cloudflare-bge-cosine.ts
│
├─ Need 100% offline, no network calls?
│ └─ YES → Use transformers-local-onnx.ts (after downloading model)
│
├─ Want fastest setup with good performance?
│ └─ YES → Use transformers-cosine.ts (auto-downloads on first run)
│
├─ Need the best accuracy?
│ └─ YES → Use mixedbread-embeddings-cosine.ts or openai-cosine.ts
│
└─ Want managed search (no embeddings management)?
└─ YES → Use jigsawstack-orama.ts or mixedbread.ts
| Strategy | First Load | Cached Load | Query Time | Total (cached) | Network Required |
|---|---|---|---|---|---|
| transformers-local-onnx | ~150ms | ~50ms | ~10-30ms | ~60-80ms | ❌ No |
| transformers-cosine | ~3-5s | ~150ms | ~10-30ms | ~160-180ms | ✅ First run only |
| mixedbread-embeddings | N/A | N/A | ~50-100ms | ~50-100ms | ✅ Every query |
| openai-cosine | N/A | N/A | ~100-200ms | ~100-200ms | ✅ Every query |
| hf-inference-qwen3 | N/A | N/A | ~150-300ms | ~150-300ms | ✅ Every query |
| cloudflare-bge | N/A | N/A | ~50-150ms | ~50-150ms | ✅ Every query |
| jigsawstack-orama | N/A | N/A | ~550ms | ~550ms | ✅ Every query |
| Strategy | Cost | Free Tier | Notes |
|---|---|---|---|
| transformers-local-onnx | $0 | ∞ | 100% free, runs locally |
| transformers-cosine | $0 | ∞ | 100% free, runs locally |
| mixedbread-embeddings | $0-$ | Generous | Free tier: 150 req/min, 100M tokens/mo |
| openai-cosine | $$ | Limited | $0.0001/1K tokens (text-embedding-3-small) |
| hf-inference-qwen3 | $0 | Generous | Free tier: 1000 req/day |
| cloudflare-bge | $0 | Generous | Free tier: 10,000 req/day |
| jigsawstack-orama | $0-$ | Limited | Free tier: limited requests |
Based on MTEB (Massive Text Embedding Benchmark) scores:
| Strategy | Model | Dimensions | MTEB Score | Notes |
|---|---|---|---|---|
| transformers-local-onnx | all-MiniLM-L6-v2 | 384 | ~58 | Fast, good quality |
| transformers-cosine | all-MiniLM-L6-v2 | 384 | ~58 | Same as local |
| mixedbread-embeddings | mxbai-embed-large-v1 | 1024 | ~64 | Higher quality |
| openai-cosine | text-embedding-3-small | 1536 | ~62 | Reliable, tested |
| hf-inference-qwen3 | Qwen3-Embedding-8B | 768 | ~65 | Very high quality |
| cloudflare-bge | bge-large-en-v1.5 | 1024 | ~64 | Good quality |
Use: transformers-local-onnx.ts
Why:
- Predictable performance (no network variance)
- No API costs
- No rate limits
- Works offline
- Fast after initial load
Best for:
- Deno Deploy
- Render / Railway / Fly.io
- Docker containers
- Any environment with file system access
Use: transformers-cosine.ts
Why:
- Zero setup (auto-downloads model)
- No API keys needed
- Good performance after first run
- Easy to switch to local later
Best for:
- Development
- Testing
- Quick demos
- When you don't want to download models manually
Use: cloudflare-bge-cosine.ts
Why:
- Workers have 1MB code size limit (can't fit local models)
- Cloudflare AI is optimized for Workers
- Free tier is generous
- Low latency
Best for:
- Cloudflare Workers/Pages
- Edge deployment
- Global low-latency requirements
Use: hf-inference-qwen3-cosine.ts or mixedbread-embeddings-cosine.ts
Why:
- Higher MTEB scores
- Better semantic understanding
- More dimensions (768-1024 vs 384)
Best for:
- When accuracy matters more than speed
- Complex semantic queries
- Production with budget for API calls
Use: mixedbread.ts or jigsawstack-orama.ts
Why:
- No embedding management needed
- Handles storage, search, and embeddings
- Less code to maintain
Best for:
- When you want a managed solution
- Don't want to store embeddings yourself
- Prefer APIs over local computation
-
Download the model:
cd search/models ./download-model.sh -
Update
search/index.ts:// Before // import { searchStrategy, generateEmbeddings } from "./openai-cosine.ts"; // After import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts"; -
Recalculate embeddings (if dimensions differ):
GET /cache/recalculate?force=true
- Download the model (same as above)
- Update the import
- No recalculation needed (same model, same dimensions)
- Recommended:
transformers-cosine.ts(isolate has caching) - Alternative: Any API-based strategy
- Avoid:
transformers-local-onnx.ts(no persistent file system)
- Recommended:
transformers-local-onnx.ts(with deployment package) - Alternative:
transformers-cosine.tsor API-based - Note: Include model files in deployment
- Recommended:
transformers-local-onnx.ts - Note: Include model files in image
- Example:
COPY search/models/all-MiniLM-L6-v2 /app/search/models/all-MiniLM-L6-v2
- Recommended: API-based strategies (less cold start time)
- Alternative:
transformers-local-onnx.ts(if you can handle cold starts) - Note: Local models increase deployment package size
- Recommended:
transformers-cosine.ts(auto-download) - Alternative:
transformers-local-onnx.ts(if downloaded) - Note: Both work great for development
Run the test harness to compare strategies on your infrastructure:
cd testing deno run --allow-read --allow-env --allow-net test-search.ts
This will show actual performance numbers for your specific setup.
| Criteria | Best Choice |
|---|---|
| Fastest | transformers-local-onnx |
| Easiest Setup | transformers-cosine |
| Most Accurate | hf-inference-qwen3 |
| Cheapest | transformers-local-onnx (free) |
| Best for Production | transformers-local-onnx |
| Best for Cloudflare | cloudflare-bge-cosine |
| Best for Val.town | transformers-cosine |
| Most Reliable | openai-cosine |
| Fully Managed | mixedbread or jigsawstack |
Default recommendation: transformers-local-onnx.ts
It offers the best combination of speed, cost (free), and reliability for most production use cases. The only downside is the initial setup (downloading model files), which takes a few minutes.
If you can't download models or need to deploy immediately, use transformers-cosine.ts as it auto-downloads on first run.
