• Townie
    AI
  • Blog
  • Docs
  • Pricing
  • We’re hiring!
Log inSign up
yawnxyz

yawnxyz

groq-docs

Public
Like
groq-docs
Home
Code
14
answer
9
data
search
14
testing
4
utils
1
.vtignore
AGENTS.md
README.md
deno.json
groq.ts
H
main.tsx
todo.md
urls.ts
utils.ts
Branches
1
Pull requests
Remixes
History
Environment variables
Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data – all from the browser, and deployed in milliseconds.
Sign up now
Code
/
search
/
models
/
SETUP.md
Code
/
search
/
models
/
SETUP.md
Search
…
Viewing readonly version of main branch: v85
View latest version
SETUP.md

Local ONNX Model Setup Guide

This guide will help you set up pre-downloaded ONNX models for the fastest embedding generation with zero network overhead.

Quick Start

# Navigate to the models directory cd search/models # Run the download script ./download-model.sh # Wait for download to complete (may take a few minutes depending on connection) # The model is ~90MB total

Manual Download (Alternative)

If the script doesn't work, you can download manually:

cd search/models # Clone the Hugging Face repository git clone https://huggingface.co/Xenova/all-MiniLM-L6-v2 # Remove .git directory to save space rm -rf all-MiniLM-L6-v2/.git

Or download individual files from: https://huggingface.co/Xenova/all-MiniLM-L6-v2/tree/main

Required files:

  • onnx/model.onnx (~23MB) - Main ONNX model
  • onnx/model_quantized.onnx (~6MB) - Quantized version (optional, faster)
  • tokenizer.json - Tokenizer vocabulary
  • tokenizer_config.json - Tokenizer configuration
  • config.json - Model configuration

Verify Installation

Check that the model files exist:

ls -lh all-MiniLM-L6-v2/ ls -lh all-MiniLM-L6-v2/onnx/

You should see:

all-MiniLM-L6-v2/
├── onnx/
│   ├── model.onnx              (~23MB)
│   └── model_quantized.onnx    (~6MB)
├── config.json
├── tokenizer.json
├── tokenizer_config.json
└── ...

Switch to Local Model Strategy

In search/index.ts, make sure this line is active:

import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";

Comment out other strategies:

// import { searchStrategy, generateEmbeddings } from "./transformers-cosine.ts";

Configuration Options

Edit search/transformers-local-onnx.ts to customize:

// Use quantized model (faster, slightly less accurate) const USE_QUANTIZED = true; // Default: false // Model path (change if you want to use a different model) const MODEL_PATH = new URL("./models/all-MiniLM-L6-v2", import.meta.url).pathname;

Performance Comparison

StrategyFirst RunCachedNetworkModel Size
transformers-cosine.ts~3-5s~150msRequired (downloads from HF)~23MB cached
transformers-local-onnx.ts~150ms~50msNone~23MB local

The local ONNX version:

  • ✅ No network calls - works completely offline
  • ✅ No downloads on first run - instant startup
  • ✅ No isolate loading delays
  • ✅ Same accuracy as the cached version
  • ✅ Perfect for serverless/edge deployments

Deployment Notes

For Val.town

  • Upload the model files to Val.town's blob storage or use a CDN
  • Update MODEL_PATH to point to the remote location
  • Or use the regular transformers-cosine.ts which downloads to isolate cache

For Deno Deploy

  • Include the models in your deployment package
  • Models will be read from the file system
  • Works perfectly with the local strategy

For Cloudflare Workers

  • Workers have a 1MB size limit, so the local strategy won't work
  • Use cloudflare-bge-cosine.ts instead (uses Cloudflare AI)

For Render / Railway / Fly.io

  • Include the models directory in your Docker image
  • Works perfectly with local file system access
  • Fastest option for these platforms

Troubleshooting

Error: "Failed to load local ONNX model"

  • Ensure model files are downloaded: ls -lh search/models/all-MiniLM-L6-v2/
  • Run the download script: ./search/models/download-model.sh
  • Check file permissions: chmod -R 755 search/models/all-MiniLM-L6-v2/

Error: "Cannot find module"

  • Make sure you're using the correct import in search/index.ts
  • Check that the file path is correct

Slow performance

  • Try the quantized model: set USE_QUANTIZED = true in transformers-local-onnx.ts
  • Check if model files are on a fast disk (SSD recommended)

Out of memory

  • The model requires ~100-200MB RAM when loaded
  • Use the quantized version to reduce memory usage

Alternative: Use Different Models

You can use other ONNX models from Hugging Face. Popular options:

  • all-MiniLM-L6-v2 (default) - 384 dims, 23MB, fast
  • all-mpnet-base-v2 - 768 dims, 420MB, more accurate
  • paraphrase-multilingual-MiniLM-L12-v2 - 384 dims, 470MB, multilingual

To use a different model:

  1. Download it to search/models/[model-name]
  2. Update MODEL_PATH in transformers-local-onnx.ts
  3. Update embedding dimension in your data pipeline if needed

Clean Up

To remove the model and reclaim disk space:

cd search/models rm -rf all-MiniLM-L6-v2

Then switch back to a different strategy in search/index.ts.

FeaturesVersion controlCode intelligenceCLI
Use cases
TeamsAI agentsSlackGTM
DocsShowcaseTemplatesNewestTrendingAPI examplesNPM packages
PricingNewsletterBlogAboutCareers
We’re hiring!
Brandhi@val.townStatus
X (Twitter)
Discord community
GitHub discussions
YouTube channel
Bluesky
Open Source Pledge
Terms of usePrivacy policyAbuse contact
© 2025 Val Town, Inc.