Get the fastest semantic search running in 3 steps.
cd search/models ./download-model.sh
Time: ~2-5 minutes (depending on internet speed)
Size: ~90MB download
What it does: Downloads all-MiniLM-L6-v2 ONNX model from Hugging Face
Edit search/index.ts:
// Comment out the current strategy
// import { searchStrategy, generateEmbeddings } from "./transformers-cosine.ts";
// Uncomment the local ONNX strategy
import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";
cd models deno run --allow-read --allow-env --allow-net --allow-ffi test-local-model.ts
Note: --allow-ffi is required for ONNX runtime native bindings.
Expected output:
š Loading local ONNX model from: /path/to/models/all-MiniLM-L6-v2
Using full model
ā
Local ONNX model loaded successfully:
npm import: 102ms
pipeline load: 137ms (from local files)
total model load: 239ms
ā
Generated embeddings successfully!
Dimensions: 384
First 5 values: [-0.0457, -0.0109, -0.0935, ...]
Time: 247ms
Test 2: Generate embeddings for multiple queries (testing cache)
ā
"How to use Groq API?"
Time: 3.87ms (cached pipeline)
Note: You may see a harmless error at the very end (mutex lock failed) - this is a known ONNX runtime cleanup issue in Deno that only affects standalone scripts that exit immediately. Your long-running server (main.tsx) won't have this issue. All functionality works correctly!
Your search is now running with:
Start your app and check /search/test endpoint:
deno run --allow-net --allow-env --allow-read main.tsx
Then visit:
http://localhost:8000/search/test?q=How%20to%20use%20Groq%20API
Look for:
{ "metadata": { "strategy": "transformers-local-onnx", "localModel": true, "modelPath": "/path/to/models/all-MiniLM-L6-v2", "timings": { "queryEmbedding": 25, "total": 65 } } }
Cause: Model files not found
Fix: Make sure you ran the download script:
cd search/models ./download-model.sh
Verify files exist:
ls -lh all-MiniLM-L6-v2/onnx/
You should see model.onnx (~23MB)
Cause: Wrong import path
Fix: Check search/index.ts has the correct import:
import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";
Cause: Using wrong strategy (auto-download version)
Fix: Confirm the import is transformers-local-onnx.ts, not transformers-cosine.ts
Edit search/transformers-local-onnx.ts:
const USE_QUANTIZED = true; // Change from false
Benefits:
If you were using a different strategy before, regenerate embeddings:
GET /cache/recalculate?force=true
This ensures all page embeddings use the same model.
| Before (transformers-cosine) | After (transformers-local-onnx) |
|---|---|
| First run: ~3-5s | First run: ~150ms |
| Cached: ~150ms | Cached: ~50ms |
| Query: ~10-30ms | Query: ~10-30ms |
| Total: ~160-180ms | Total: ~60-80ms |
| Network: Required (first run) | Network: None |
Result: ~2-3x faster, no network dependency! š
search/models/SETUP.mdsearch/STRATEGY-COMPARISON.mdsearch/README.mdREADME.md (see "Search" section)search/models/SETUP.md for detailed instructionssearch/STRATEGY-COMPARISON.md to compare all strategiessearch/models/test-local-model.ts to verify your setup