groq-docs
Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data – all from the browser, and deployed in milliseconds.
Viewing readonly version of main branch: v94View latest version
This guide will help you set up pre-downloaded ONNX models for the fastest embedding generation with zero network overhead.
# Navigate to the models directory cd search/models # Run the download script ./download-model.sh # Wait for download to complete (may take a few minutes depending on connection) # The model is ~90MB total
If the script doesn't work, you can download manually:
cd search/models # Clone the Hugging Face repository git clone https://huggingface.co/Xenova/all-MiniLM-L6-v2 # Remove .git directory to save space rm -rf all-MiniLM-L6-v2/.git
Or download individual files from: https://huggingface.co/Xenova/all-MiniLM-L6-v2/tree/main
Required files:
onnx/model.onnx(~23MB) - Main ONNX modelonnx/model_quantized.onnx(~6MB) - Quantized version (optional, faster)tokenizer.json- Tokenizer vocabularytokenizer_config.json- Tokenizer configurationconfig.json- Model configuration
Check that the model files exist:
ls -lh all-MiniLM-L6-v2/ ls -lh all-MiniLM-L6-v2/onnx/
You should see:
all-MiniLM-L6-v2/
├── onnx/
│ ├── model.onnx (~23MB)
│ └── model_quantized.onnx (~6MB)
├── config.json
├── tokenizer.json
├── tokenizer_config.json
└── ...
In search/index.ts, make sure this line is active:
import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";
Comment out other strategies:
// import { searchStrategy, generateEmbeddings } from "./transformers-cosine.ts";
Edit search/transformers-local-onnx.ts to customize:
// Use quantized model (faster, slightly less accurate)
const USE_QUANTIZED = true; // Default: false
// Model path (change if you want to use a different model)
const MODEL_PATH = new URL("./models/all-MiniLM-L6-v2", import.meta.url).pathname;
| Strategy | First Run | Cached | Network | Model Size |
|---|---|---|---|---|
transformers-cosine.ts | ~3-5s | ~150ms | Required (downloads from HF) | ~23MB cached |
transformers-local-onnx.ts | ~150ms | ~50ms | None | ~23MB local |
The local ONNX version:
- ✅ No network calls - works completely offline
- ✅ No downloads on first run - instant startup
- ✅ No isolate loading delays
- ✅ Same accuracy as the cached version
- ✅ Perfect for serverless/edge deployments
- Upload the model files to Val.town's blob storage or use a CDN
- Update
MODEL_PATHto point to the remote location - Or use the regular
transformers-cosine.tswhich downloads to isolate cache
- Include the models in your deployment package
- Models will be read from the file system
- Works perfectly with the local strategy
- Workers have a 1MB size limit, so the local strategy won't work
- Use
cloudflare-bge-cosine.tsinstead (uses Cloudflare AI)
- Include the models directory in your Docker image
- Works perfectly with local file system access
- Fastest option for these platforms
- Ensure model files are downloaded:
ls -lh search/models/all-MiniLM-L6-v2/ - Run the download script:
./search/models/download-model.sh - Check file permissions:
chmod -R 755 search/models/all-MiniLM-L6-v2/
- Make sure you're using the correct import in
search/index.ts - Check that the file path is correct
- Try the quantized model: set
USE_QUANTIZED = trueintransformers-local-onnx.ts - Check if model files are on a fast disk (SSD recommended)
- The model requires ~100-200MB RAM when loaded
- Use the quantized version to reduce memory usage
You can use other ONNX models from Hugging Face. Popular options:
- all-MiniLM-L6-v2 (default) - 384 dims, 23MB, fast
- all-mpnet-base-v2 - 768 dims, 420MB, more accurate
- paraphrase-multilingual-MiniLM-L12-v2 - 384 dims, 470MB, multilingual
To use a different model:
- Download it to
search/models/[model-name] - Update
MODEL_PATHintransformers-local-onnx.ts - Update embedding dimension in your data pipeline if needed
To remove the model and reclaim disk space:
cd search/models rm -rf all-MiniLM-L6-v2
Then switch back to a different strategy in search/index.ts.
