• Townie
    AI
  • Blog
  • Docs
  • Pricing
  • We’re hiring!
Log inSign up
yawnxyz

yawnxyz

groq-docs

Public
Like
groq-docs
Home
Code
14
answer
9
data
search
17
testing
7
utils
1
.vtignore
AGENTS.md
README.md
deno.json
groq.ts
H
main.tsx
todo.md
urls.ts
utils.ts
Branches
1
Pull requests
Remixes
History
Environment variables
Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data – all from the browser, and deployed in milliseconds.
Sign up now
Code
/
search
/
QUICK-START.md
Code
/
search
/
QUICK-START.md
Search
…
Viewing readonly version of main branch: v103
View latest version
QUICK-START.md

Quick Start Guide - Local ONNX Models

Get the fastest semantic search running in 3 steps.

Step 1: Download the Model (One-Time)

cd search/models ./download-model.sh

Time: ~2-5 minutes (depending on internet speed)
Size: ~90MB download
What it does: Downloads all-MiniLM-L6-v2 ONNX model from Hugging Face

Step 2: Activate the Strategy

Edit search/index.ts:

// Comment out the current strategy // import { searchStrategy, generateEmbeddings } from "./transformers-cosine.ts"; // Uncomment the local ONNX strategy import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";

Step 3: Test It

cd models deno run --allow-read --allow-env --allow-net --allow-ffi test-local-model.ts

Note: --allow-ffi is required for ONNX runtime native bindings.

Expected output:

📂 Loading local ONNX model from: /path/to/models/all-MiniLM-L6-v2
   Using full model
✅ Local ONNX model loaded successfully:
   npm import: 102ms
   pipeline load: 137ms (from local files)
   total model load: 239ms

✅ Generated embeddings successfully!
   Dimensions: 384
   First 5 values: [-0.0457, -0.0109, -0.0935, ...]
   Time: 247ms

Test 2: Generate embeddings for multiple queries (testing cache)
✅ "How to use Groq API?"
   Time: 3.87ms (cached pipeline)

Note: You may see a harmless error at the very end (mutex lock failed) - this is a known ONNX runtime cleanup issue in Deno that only affects standalone scripts that exit immediately. Your long-running server (main.tsx) won't have this issue. All functionality works correctly!

That's It! 🎉

Your search is now running with:

  • ✅ No network calls (100% offline)
  • ✅ No API keys needed
  • ✅ ~10-30ms query time
  • ✅ ~60-80ms total search time

Verify in Your App

Start your app and check /search/test endpoint:

deno run --allow-net --allow-env --allow-read main.tsx

Then visit:

http://localhost:8000/search/test?q=How%20to%20use%20Groq%20API

Look for:

{ "metadata": { "strategy": "transformers-local-onnx", "localModel": true, "modelPath": "/path/to/models/all-MiniLM-L6-v2", "timings": { "queryEmbedding": 25, "total": 65 } } }

Troubleshooting

"Failed to load local ONNX model"

Cause: Model files not found
Fix: Make sure you ran the download script:

cd search/models ./download-model.sh

Verify files exist:

ls -lh all-MiniLM-L6-v2/onnx/

You should see model.onnx (~23MB)

"Module not found"

Cause: Wrong import path
Fix: Check search/index.ts has the correct import:

import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";

Still slow (~3-5s first query)

Cause: Using wrong strategy (auto-download version)
Fix: Confirm the import is transformers-local-onnx.ts, not transformers-cosine.ts

Next Steps

Optional: Use Quantized Model (Faster, Smaller)

Edit search/transformers-local-onnx.ts:

const USE_QUANTIZED = true; // Change from false

Benefits:

  • Smaller: ~6MB vs ~23MB
  • Faster: ~15-20ms vs ~25-30ms per query
  • Slightly less accurate: ~57 vs ~58 MTEB score

Optional: Recalculate Embeddings

If you were using a different strategy before, regenerate embeddings:

GET /cache/recalculate?force=true

This ensures all page embeddings use the same model.

Performance Comparison

Before (transformers-cosine)After (transformers-local-onnx)
First run: ~3-5sFirst run: ~150ms
Cached: ~150msCached: ~50ms
Query: ~10-30msQuery: ~10-30ms
Total: ~160-180msTotal: ~60-80ms
Network: Required (first run)Network: None

Result: ~2-3x faster, no network dependency! 🚀

Resources

  • Full Setup Guide: search/models/SETUP.md
  • Strategy Comparison: search/STRATEGY-COMPARISON.md
  • Search Module Docs: search/README.md
  • Main Docs: README.md (see "Search" section)

Need Help?

  1. Check search/models/SETUP.md for detailed instructions
  2. Read search/STRATEGY-COMPARISON.md to compare all strategies
  3. Run search/models/test-local-model.ts to verify your setup
  4. Check the console for error messages with helpful hints
FeaturesVersion controlCode intelligenceCLI
Use cases
TeamsAI agentsSlackGTM
DocsShowcaseTemplatesNewestTrendingAPI examplesNPM packages
PricingNewsletterBlogAboutCareers
We’re hiring!
Brandhi@val.townStatus
X (Twitter)
Discord community
GitHub discussions
YouTube channel
Bluesky
Open Source Pledge
Terms of usePrivacy policyAbuse contact
© 2025 Val Town, Inc.