This guide will help you set up pre-downloaded ONNX models for the fastest embedding generation with zero network overhead.
# Navigate to the models directory cd search/models # Run the download script ./download-model.sh # Wait for download to complete (may take a few minutes depending on connection) # The model is ~90MB total
If the script doesn't work, you can download manually:
cd search/models # Clone the Hugging Face repository git clone https://huggingface.co/Xenova/all-MiniLM-L6-v2 # Remove .git directory to save space rm -rf all-MiniLM-L6-v2/.git
Or download individual files from: https://huggingface.co/Xenova/all-MiniLM-L6-v2/tree/main
Required files:
onnx/model.onnx (~23MB) - Main ONNX modelonnx/model_quantized.onnx (~6MB) - Quantized version (optional, faster)tokenizer.json - Tokenizer vocabularytokenizer_config.json - Tokenizer configurationconfig.json - Model configurationCheck that the model files exist:
ls -lh all-MiniLM-L6-v2/ ls -lh all-MiniLM-L6-v2/onnx/
You should see:
all-MiniLM-L6-v2/
├── onnx/
│ ├── model.onnx (~23MB)
│ └── model_quantized.onnx (~6MB)
├── config.json
├── tokenizer.json
├── tokenizer_config.json
└── ...
In search/index.ts, make sure this line is active:
import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";
Comment out other strategies:
// import { searchStrategy, generateEmbeddings } from "./transformers-cosine.ts";
Edit search/transformers-local-onnx.ts to customize:
// Use quantized model (faster, slightly less accurate)
const USE_QUANTIZED = true; // Default: false
// Model path (change if you want to use a different model)
const MODEL_PATH = new URL("./models/all-MiniLM-L6-v2", import.meta.url).pathname;
| Strategy | First Run | Cached | Network | Model Size |
|---|---|---|---|---|
transformers-cosine.ts | ~3-5s | ~150ms | Required (downloads from HF) | ~23MB cached |
transformers-local-onnx.ts | ~150ms | ~50ms | None | ~23MB local |
The local ONNX version:
MODEL_PATH to point to the remote locationtransformers-cosine.ts which downloads to isolate cachecloudflare-bge-cosine.ts instead (uses Cloudflare AI)ls -lh search/models/all-MiniLM-L6-v2/./search/models/download-model.shchmod -R 755 search/models/all-MiniLM-L6-v2/search/index.tsUSE_QUANTIZED = true in transformers-local-onnx.tsYou can use other ONNX models from Hugging Face. Popular options:
To use a different model:
search/models/[model-name]MODEL_PATH in transformers-local-onnx.tsTo remove the model and reclaim disk space:
cd search/models rm -rf all-MiniLM-L6-v2
Then switch back to a different strategy in search/index.ts.