Quick Start Guide - Local ONNX Models

Get the fastest semantic search running in 3 steps.

Step 1: Download the Model (One-Time)

cd search/models
./download-model.sh

Time: ~2-5 minutes (depending on internet speed)
Size: ~90MB download
What it does: Downloads all-MiniLM-L6-v2 ONNX model from Hugging Face

Step 2: Activate the Strategy

Edit search/index.ts:

// Comment out the current strategy
// import { searchStrategy, generateEmbeddings } from "./transformers-cosine.ts";

// Uncomment the local ONNX strategy
import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";

Step 3: Test It

cd models
deno run --allow-read --allow-env --allow-net --allow-ffi test-local-model.ts

Note: --allow-ffi is required for ONNX runtime native bindings.

Expected output:

📂 Loading local ONNX model from: /path/to/models/all-MiniLM-L6-v2
   Using full model
✅ Local ONNX model loaded successfully:
   npm import: 102ms
   pipeline load: 137ms (from local files)
   total model load: 239ms

✅ Generated embeddings successfully!
   Dimensions: 384
   First 5 values: [-0.0457, -0.0109, -0.0935, ...]
   Time: 247ms

Test 2: Generate embeddings for multiple queries (testing cache)
✅ "How to use Groq API?"
   Time: 3.87ms (cached pipeline)

Note: You may see a harmless error at the very end (mutex lock failed) - this is a known ONNX runtime cleanup issue in Deno that only affects standalone scripts that exit immediately. Your long-running server (main.tsx) won't have this issue. All functionality works correctly!

That's It! 🎉

Your search is now running with:

✅ No network calls (100% offline)
✅ No API keys needed
✅ ~10-30ms query time
✅ ~60-80ms total search time

Verify in Your App

Start your app and check /search/test endpoint:

deno run --allow-net --allow-env --allow-read main.tsx

Then visit:

http://localhost:8000/search/test?q=How%20to%20use%20Groq%20API

Look for:

{
  "metadata": {
    "strategy": "transformers-local-onnx",
    "localModel": true,
    "modelPath": "/path/to/models/all-MiniLM-L6-v2",
    "timings": {
      "queryEmbedding": 25,
      "total": 65
    }
  }
}

Troubleshooting

"Failed to load local ONNX model"

Cause: Model files not found
Fix: Make sure you ran the download script:

cd search/models
./download-model.sh

Verify files exist:

ls -lh all-MiniLM-L6-v2/onnx/

You should see model.onnx (~23MB)

"Module not found"

Cause: Wrong import path
Fix: Check search/index.ts has the correct import:

import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";

Still slow (~3-5s first query)

Cause: Using wrong strategy (auto-download version)
Fix: Confirm the import is transformers-local-onnx.ts, not transformers-cosine.ts

Next Steps

Optional: Use Quantized Model (Faster, Smaller)

Edit search/transformers-local-onnx.ts:

const USE_QUANTIZED = true;  // Change from false

Benefits:

Smaller: ~6MB vs ~23MB
Faster: ~15-20ms vs ~25-30ms per query
Slightly less accurate: ~57 vs ~58 MTEB score

Optional: Recalculate Embeddings

If you were using a different strategy before, regenerate embeddings:

GET /cache/recalculate?force=true

This ensures all page embeddings use the same model.

Performance Comparison

Before (transformers-cosine)	After (transformers-local-onnx)
First run: ~3-5s	First run: ~150ms
Cached: ~150ms	Cached: ~50ms
Query: ~10-30ms	Query: ~10-30ms
Total: ~160-180ms	Total: ~60-80ms
Network: Required (first run)	Network: None

Result: ~2-3x faster, no network dependency! 🚀

Resources

Full Setup Guide: search/models/SETUP.md
Strategy Comparison: search/STRATEGY-COMPARISON.md
Search Module Docs: search/README.md
Main Docs: README.md (see "Search" section)

Need Help?

Check search/models/SETUP.md for detailed instructions
Read search/STRATEGY-COMPARISON.md to compare all strategies
Run search/models/test-local-model.ts to verify your setup
Check the console for error messages with helpful hints