Pluggable semantic search system with multiple embedding strategies.
The fastest option for production is local ONNX models:
# Download the model (one-time setup, ~90MB) cd models ./download-model.sh
Edit search/index.ts and uncomment the desired strategy:
import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";
import { searchPages } from "./search/index.ts";
const results = await searchPages("How to use Groq API?", pages, {
limit: 10,
minScore: 50,
enableTiming: true,
});
| Strategy | Speed | Cost | Setup | Best For |
|---|---|---|---|---|
| transformers-local-onnx ⭐ | ~60-80ms | Free | Download model | Production |
| transformers-cosine | ~160-180ms | Free | None (auto-download) | Development |
| mixedbread-embeddings | ~50-100ms | Free tier | API key | High accuracy |
| openai-cosine | ~100-200ms | Paid | API key | Reliability |
| hf-inference-qwen3 | ~150-300ms | Free tier | API key | Best accuracy |
| cloudflare-bge | ~50-150ms | Free tier | API key | Cloudflare Workers |
| jigsawstack-orama | ~550ms | Free tier | API key | Managed solution |
⭐ = Recommended for production
index.ts - Main entry point, switch strategies heretypes.ts - TypeScript interfaces for search systemutils.ts - Shared utilities (cosine similarity, snippet generation)transformers-local-onnx.ts - Local ONNX models (fastest, recommended)transformers-cosine.ts - Auto-download ONNX modelsmixedbread-embeddings-cosine.ts - Mixedbread API + local cosineopenai-cosine.ts - OpenAI embeddings + local cosinehf-inference-qwen3-cosine.ts - HuggingFace Qwen3-8B embeddingscloudflare-bge-cosine.ts - Cloudflare Workers AIjigsawstack-orama.ts - JigsawStack managed searchmixedbread.ts - Mixedbread Stores (managed)placeholder.ts - Fake embeddings for testingmodels/README.md - Model setup instructionsmodels/SETUP.md - Detailed setup guideSTRATEGY-COMPARISON.md - Detailed comparison of all strategiesTest the local ONNX model:
cd models deno run --allow-read --allow-env --allow-net test-local-model.ts
Run the full search harness:
cd ../testing deno run --allow-read --allow-env --allow-net test-search.ts
async function searchPages(
query: string,
pages: Page[],
options?: SearchOptions
): Promise<SearchResult[]>
Options:
limit: Maximum results to return (default: 10)minScore: Minimum similarity score 0-100 (default: 0)enableTiming: Log timing breakdown (default: false)Returns: Array of search results sorted by relevance
async function generateEmbeddings(
content: string
): Promise<number[] | null>
Returns: 384-dimensional embedding vector (or configured dimensions)
Query
↓
Generate Query Embedding (10-30ms)
↓
Compare with Page Embeddings (cosine similarity, <1ms per page)
↓
Sort by Similarity
↓
Generate Snippets
↓
Return Results
USE_QUANTIZED = true)minScore to filter low-quality results✅ Use transformers-local-onnx.ts
✅ Use cloudflare-bge-cosine.ts
✅ Use transformers-cosine.ts
✅ Use transformers-local-onnx.ts
COPY search/models/all-MiniLM-L6-v2 /app/search/models/all-MiniLM-L6-v2
Download the model:
cd search/models ./download-model.sh
Check the import in search/index.ts:
import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";
USE_QUANTIZED = true)If switching strategies with different dimensions:
GET /cache/recalculate?force=true
This regenerates all embeddings with the new strategy.
To add a new search strategy:
search/my-strategy.tsSearchStrategy interface:
export const searchStrategy: SearchStrategy = {
name: "my-strategy",
description: "Description...",
search: async (query, pages, options) => {
// Implementation
},
};
export const generateEmbeddings = async (content: string) => {
// Generate embeddings
};
STRATEGY-COMPARISON.mdindex.ts as an optionPart of the groq-docs project.