Local ONNX Model Setup Guide

This guide will help you set up pre-downloaded ONNX models for the fastest embedding generation with zero network overhead.

Quick Start

# Navigate to the models directory
cd search/models

# Run the download script
./download-model.sh

# Wait for download to complete (may take a few minutes depending on connection)
# The model is ~90MB total

Manual Download (Alternative)

If the script doesn't work, you can download manually:

cd search/models

# Clone the Hugging Face repository
git clone https://huggingface.co/Xenova/all-MiniLM-L6-v2

# Remove .git directory to save space
rm -rf all-MiniLM-L6-v2/.git

Or download individual files from: https://huggingface.co/Xenova/all-MiniLM-L6-v2/tree/main

Required files:

onnx/model.onnx (~23MB) - Main ONNX model
onnx/model_quantized.onnx (~6MB) - Quantized version (optional, faster)
tokenizer.json - Tokenizer vocabulary
tokenizer_config.json - Tokenizer configuration
config.json - Model configuration

Verify Installation

Check that the model files exist:

ls -lh all-MiniLM-L6-v2/
ls -lh all-MiniLM-L6-v2/onnx/

You should see:

all-MiniLM-L6-v2/
├── onnx/
│   ├── model.onnx              (~23MB)
│   └── model_quantized.onnx    (~6MB)
├── config.json
├── tokenizer.json
├── tokenizer_config.json
└── ...

Switch to Local Model Strategy

In search/index.ts, make sure this line is active:

import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";

Comment out other strategies:

// import { searchStrategy, generateEmbeddings } from "./transformers-cosine.ts";

Configuration Options

Edit search/transformers-local-onnx.ts to customize:

// Use quantized model (faster, slightly less accurate)
const USE_QUANTIZED = true;  // Default: false

// Model path (change if you want to use a different model)
const MODEL_PATH = new URL("./models/all-MiniLM-L6-v2", import.meta.url).pathname;

Performance Comparison

Strategy	First Run	Cached	Network	Model Size
`transformers-cosine.ts`	~3-5s	~150ms	Required (downloads from HF)	~23MB cached
`transformers-local-onnx.ts`	~150ms	~50ms	None	~23MB local

The local ONNX version:

✅ No network calls - works completely offline
✅ No downloads on first run - instant startup
✅ No isolate loading delays
✅ Same accuracy as the cached version
✅ Perfect for serverless/edge deployments

Deployment Notes

For Val.town

Upload the model files to Val.town's blob storage or use a CDN
Update MODEL_PATH to point to the remote location
Or use the regular transformers-cosine.ts which downloads to isolate cache

For Deno Deploy

Include the models in your deployment package
Models will be read from the file system
Works perfectly with the local strategy

For Cloudflare Workers

Workers have a 1MB size limit, so the local strategy won't work
Use cloudflare-bge-cosine.ts instead (uses Cloudflare AI)

For Render / Railway / Fly.io

Include the models directory in your Docker image
Works perfectly with local file system access
Fastest option for these platforms

Troubleshooting

Error: "Failed to load local ONNX model"

Ensure model files are downloaded: ls -lh search/models/all-MiniLM-L6-v2/
Run the download script: ./search/models/download-model.sh
Check file permissions: chmod -R 755 search/models/all-MiniLM-L6-v2/

Error: "Cannot find module"

Make sure you're using the correct import in search/index.ts
Check that the file path is correct

Slow performance

Try the quantized model: set USE_QUANTIZED = true in transformers-local-onnx.ts
Check if model files are on a fast disk (SSD recommended)

Out of memory

The model requires ~100-200MB RAM when loaded
Use the quantized version to reduce memory usage

Alternative: Use Different Models

You can use other ONNX models from Hugging Face. Popular options:

all-MiniLM-L6-v2 (default) - 384 dims, 23MB, fast
all-mpnet-base-v2 - 768 dims, 420MB, more accurate
paraphrase-multilingual-MiniLM-L12-v2 - 384 dims, 470MB, multilingual

To use a different model:

Download it to search/models/[model-name]
Update MODEL_PATH in transformers-local-onnx.ts
Update embedding dimension in your data pipeline if needed

Clean Up

To remove the model and reclaim disk space:

cd search/models
rm -rf all-MiniLM-L6-v2

Then switch back to a different strategy in search/index.ts.

yawnxyz

groq-docs