Quick Start Guide - Local ONNX Models

Get the fastest semantic search running in 3 steps.

Step 1: Download the Model (One-Time)

cd search/models ./download-model.sh

Time: ~2-5 minutes (depending on internet speed)
Size: ~90MB download
What it does: Downloads all-MiniLM-L6-v2 ONNX model from Hugging Face

Step 2: Activate the Strategy

Edit search/index.ts:

// Comment out the current strategy // import { searchStrategy, generateEmbeddings } from "./transformers-cosine.ts"; // Uncomment the local ONNX strategy import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";

Step 3: Test It

cd models deno run --allow-read --allow-env --allow-net --allow-ffi test-local-model.ts

Note: --allow-ffi is required for ONNX runtime native bindings.

Expected output:

šŸ“‚ Loading local ONNX model from: /path/to/models/all-MiniLM-L6-v2
   Using full model
āœ… Local ONNX model loaded successfully:
   npm import: 102ms
   pipeline load: 137ms (from local files)
   total model load: 239ms

āœ… Generated embeddings successfully!
   Dimensions: 384
   First 5 values: [-0.0457, -0.0109, -0.0935, ...]
   Time: 247ms

Test 2: Generate embeddings for multiple queries (testing cache)
āœ… "How to use Groq API?"
   Time: 3.87ms (cached pipeline)

Note: You may see a harmless error at the very end (mutex lock failed) - this is a known ONNX runtime cleanup issue in Deno that only affects standalone scripts that exit immediately. Your long-running server (main.tsx) won't have this issue. All functionality works correctly!

That's It! šŸŽ‰

Your search is now running with:

  • āœ… No network calls (100% offline)
  • āœ… No API keys needed
  • āœ… ~10-30ms query time
  • āœ… ~60-80ms total search time

Verify in Your App

Start your app and check /search/test endpoint:

deno run --allow-net --allow-env --allow-read main.tsx

Then visit:

http://localhost:8000/search/test?q=How%20to%20use%20Groq%20API

Look for:

{ "metadata": { "strategy": "transformers-local-onnx", "localModel": true, "modelPath": "/path/to/models/all-MiniLM-L6-v2", "timings": { "queryEmbedding": 25, "total": 65 } } }

Troubleshooting

"Failed to load local ONNX model"

Cause: Model files not found
Fix: Make sure you ran the download script:

cd search/models ./download-model.sh

Verify files exist:

ls -lh all-MiniLM-L6-v2/onnx/

You should see model.onnx (~23MB)

"Module not found"

Cause: Wrong import path
Fix: Check search/index.ts has the correct import:

import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";

Still slow (~3-5s first query)

Cause: Using wrong strategy (auto-download version)
Fix: Confirm the import is transformers-local-onnx.ts, not transformers-cosine.ts

Next Steps

Optional: Use Quantized Model (Faster, Smaller)

Edit search/transformers-local-onnx.ts:

const USE_QUANTIZED = true; // Change from false

Benefits:

  • Smaller: ~6MB vs ~23MB
  • Faster: ~15-20ms vs ~25-30ms per query
  • Slightly less accurate: ~57 vs ~58 MTEB score

Optional: Recalculate Embeddings

If you were using a different strategy before, regenerate embeddings:

GET /cache/recalculate?force=true

This ensures all page embeddings use the same model.

Performance Comparison

Before (transformers-cosine)After (transformers-local-onnx)
First run: ~3-5sFirst run: ~150ms
Cached: ~150msCached: ~50ms
Query: ~10-30msQuery: ~10-30ms
Total: ~160-180msTotal: ~60-80ms
Network: Required (first run)Network: None

Result: ~2-3x faster, no network dependency! šŸš€

Resources

  • Full Setup Guide: search/models/SETUP.md
  • Strategy Comparison: search/STRATEGY-COMPARISON.md
  • Search Module Docs: search/README.md
  • Main Docs: README.md (see "Search" section)

Need Help?

  1. Check search/models/SETUP.md for detailed instructions
  2. Read search/STRATEGY-COMPARISON.md to compare all strategies
  3. Run search/models/test-local-model.ts to verify your setup
  4. Check the console for error messages with helpful hints