groq-docs
Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data – all from the browser, and deployed in milliseconds.
Pluggable semantic search system with multiple embedding strategies.
The fastest option for production is local ONNX models:
# Download the model (one-time setup, ~90MB) cd models ./download-model.sh
Edit search/index.ts and uncomment the desired strategy:
import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";
import { searchPages } from "./search/index.ts";
const results = await searchPages("How to use Groq API?", pages, {
limit: 10,
minScore: 50,
enableTiming: true,
});
| Strategy | Speed | Cost | Setup | Best For |
|---|---|---|---|---|
| transformers-local-onnx ⭐ | ~60-80ms | Free | Download model | Production |
| transformers-cosine | ~160-180ms | Free | None (auto-download) | Development |
| mixedbread-embeddings | ~50-100ms | Free tier | API key | High accuracy |
| openai-cosine | ~100-200ms | Paid | API key | Reliability |
| hf-inference-qwen3 | ~150-300ms | Free tier | API key | Best accuracy |
| cloudflare-bge | ~50-150ms | Free tier | API key | Cloudflare Workers |
| jigsawstack-orama | ~550ms | Free tier | API key | Managed solution |
⭐ = Recommended for production
index.ts- Main entry point, switch strategies heretypes.ts- TypeScript interfaces for search systemutils.ts- Shared utilities (cosine similarity, snippet generation)transformers-local-onnx.ts- Local ONNX models (fastest, recommended)transformers-cosine.ts- Auto-download ONNX modelsmixedbread-embeddings-cosine.ts- Mixedbread API + local cosineopenai-cosine.ts- OpenAI embeddings + local cosinehf-inference-qwen3-cosine.ts- HuggingFace Qwen3-8B embeddingscloudflare-bge-cosine.ts- Cloudflare Workers AIjigsawstack-orama.ts- JigsawStack managed searchmixedbread.ts- Mixedbread Stores (managed)placeholder.ts- Fake embeddings for testing
models/README.md- Model setup instructionsmodels/SETUP.md- Detailed setup guideSTRATEGY-COMPARISON.md- Detailed comparison of all strategies
Test the local ONNX model:
cd models deno run --allow-read --allow-env --allow-net test-local-model.ts
Run the full search harness:
cd ../testing deno run --allow-read --allow-env --allow-net test-search.ts
async function searchPages(
query: string,
pages: Page[],
options?: SearchOptions
): Promise<SearchResult[]>
Options:
limit: Maximum results to return (default: 10)minScore: Minimum similarity score 0-100 (default: 0)enableTiming: Log timing breakdown (default: false)
Returns: Array of search results sorted by relevance
async function generateEmbeddings(
content: string
): Promise<number[] | null>
Returns: 384-dimensional embedding vector (or configured dimensions)
Query
↓
Generate Query Embedding (10-30ms)
↓
Compare with Page Embeddings (cosine similarity, <1ms per page)
↓
Sort by Similarity
↓
Generate Snippets
↓
Return Results
- Use local ONNX models for production (fastest, most reliable)
- Pre-calculate embeddings during recalculation (don't generate at query time)
- Cache the pipeline (automatically done, but worth noting)
- Use quantized models if memory is constrained (set
USE_QUANTIZED = true) - Adjust
minScoreto filter low-quality results
✅ Use transformers-local-onnx.ts
- Include model files in deployment
- Fast, reliable, no network calls
✅ Use cloudflare-bge-cosine.ts
- Workers have size limits (can't fit local models)
- Cloudflare AI is optimized for Workers
✅ Use transformers-cosine.ts
- Isolate has caching for downloaded models
- No persistent file system for pre-downloaded models
✅ Use transformers-local-onnx.ts
- Include models in image:
COPY search/models/all-MiniLM-L6-v2 /app/search/models/all-MiniLM-L6-v2
Download the model:
cd search/models ./download-model.sh
Check the import in search/index.ts:
import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";
- Check which strategy is active
- Ensure model is cached (first run is slower)
- Try quantized model (
USE_QUANTIZED = true) - Check network latency (for API-based strategies)
If switching strategies with different dimensions:
GET /cache/recalculate?force=true
This regenerates all embeddings with the new strategy.
To add a new search strategy:
- Create
search/my-strategy.ts - Implement
SearchStrategyinterface:export const searchStrategy: SearchStrategy = { name: "my-strategy", description: "Description...", search: async (query, pages, options) => { // Implementation }, }; export const generateEmbeddings = async (content: string) => { // Generate embeddings }; - Document in
STRATEGY-COMPARISON.md - Add to
index.tsas an option
Part of the groq-docs project.
