Get the fastest semantic search running in 3 steps.
cd search/models ./download-model.sh
Time: ~2-5 minutes (depending on internet speed)
Size: ~90MB download
What it does: Downloads all-MiniLM-L6-v2 ONNX model from Hugging Face
Edit search/index.ts:
// Comment out the current strategy
// import { searchStrategy, generateEmbeddings } from "./transformers-cosine.ts";
// Uncomment the local ONNX strategy
import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";
cd models deno run --allow-read --allow-env --allow-net --allow-ffi test-local-model.ts
Note: --allow-ffi is required for ONNX runtime native bindings.
Expected output:
📂 Loading local ONNX model from: /path/to/models/all-MiniLM-L6-v2
Using full model
✅ Local ONNX model loaded successfully:
npm import: 102ms
pipeline load: 137ms (from local files)
total model load: 239ms
✅ Generated embeddings successfully!
Dimensions: 384
First 5 values: [-0.0457, -0.0109, -0.0935, ...]
Time: 247ms
Test 2: Generate embeddings for multiple queries (testing cache)
✅ "How to use Groq API?"
Time: 3.87ms (cached pipeline)
Note: You may see a harmless error at the very end (mutex lock failed) - this is a known ONNX runtime cleanup issue in Deno that only affects standalone scripts that exit immediately. Your long-running server (main.tsx) won't have this issue. All functionality works correctly!
Your search is now running with:
- ✅ No network calls (100% offline)
- ✅ No API keys needed
- ✅ ~10-30ms query time
- ✅ ~60-80ms total search time
Start your app and check /search/test endpoint:
deno run --allow-net --allow-env --allow-read main.tsx
Then visit:
http://localhost:8000/search/test?q=How%20to%20use%20Groq%20API
Look for:
{ "metadata": { "strategy": "transformers-local-onnx", "localModel": true, "modelPath": "/path/to/models/all-MiniLM-L6-v2", "timings": { "queryEmbedding": 25, "total": 65 } } }
Cause: Model files not found
Fix: Make sure you ran the download script:
cd search/models ./download-model.sh
Verify files exist:
ls -lh all-MiniLM-L6-v2/onnx/
You should see model.onnx (~23MB)
Cause: Wrong import path
Fix: Check search/index.ts has the correct import:
import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";
Cause: Using wrong strategy (auto-download version)
Fix: Confirm the import is transformers-local-onnx.ts, not transformers-cosine.ts
Edit search/transformers-local-onnx.ts:
const USE_QUANTIZED = true; // Change from false
Benefits:
- Smaller: ~6MB vs ~23MB
- Faster: ~15-20ms vs ~25-30ms per query
- Slightly less accurate: ~57 vs ~58 MTEB score
If you were using a different strategy before, regenerate embeddings:
GET /cache/recalculate?force=true
This ensures all page embeddings use the same model.
| Before (transformers-cosine) | After (transformers-local-onnx) |
|---|---|
| First run: ~3-5s | First run: ~150ms |
| Cached: ~150ms | Cached: ~50ms |
| Query: ~10-30ms | Query: ~10-30ms |
| Total: ~160-180ms | Total: ~60-80ms |
| Network: Required (first run) | Network: None |
Result: ~2-3x faster, no network dependency! 🚀
- Full Setup Guide:
search/models/SETUP.md - Strategy Comparison:
search/STRATEGY-COMPARISON.md - Search Module Docs:
search/README.md - Main Docs:
README.md(see "Search" section)
- Check
search/models/SETUP.mdfor detailed instructions - Read
search/STRATEGY-COMPARISON.mdto compare all strategies - Run
search/models/test-local-model.tsto verify your setup - Check the console for error messages with helpful hints
