Search Module

Pluggable semantic search system with multiple embedding strategies.

Quick Start

1. Choose Your Strategy

The fastest option for production is local ONNX models:

# Download the model (one-time setup, ~90MB)
cd models
./download-model.sh

2. Activate the Strategy

Edit search/index.ts and uncomment the desired strategy:

import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";

3. Use the Search

import { searchPages } from "./search/index.ts";

const results = await searchPages("How to use Groq API?", pages, {
  limit: 10,
  minScore: 50,
  enableTiming: true,
});

Available Strategies

Strategy	Speed	Cost	Setup	Best For
transformers-local-onnx ⭐	~60-80ms	Free	Download model	Production
transformers-cosine	~160-180ms	Free	None (auto-download)	Development
mixedbread-embeddings	~50-100ms	Free tier	API key	High accuracy
openai-cosine	~100-200ms	Paid	API key	Reliability
hf-inference-qwen3	~150-300ms	Free tier	API key	Best accuracy
cloudflare-bge	~50-150ms	Free tier	API key	Cloudflare Workers
jigsawstack-orama	~550ms	Free tier	API key	Managed solution

⭐ = Recommended for production

Files

index.ts - Main entry point, switch strategies here
types.ts - TypeScript interfaces for search system
utils.ts - Shared utilities (cosine similarity, snippet generation)
transformers-local-onnx.ts - Local ONNX models (fastest, recommended)
transformers-cosine.ts - Auto-download ONNX models
mixedbread-embeddings-cosine.ts - Mixedbread API + local cosine
openai-cosine.ts - OpenAI embeddings + local cosine
hf-inference-qwen3-cosine.ts - HuggingFace Qwen3-8B embeddings
cloudflare-bge-cosine.ts - Cloudflare Workers AI
jigsawstack-orama.ts - JigsawStack managed search
mixedbread.ts - Mixedbread Stores (managed)
placeholder.ts - Fake embeddings for testing

Documentation

models/README.md - Model setup instructions
models/SETUP.md - Detailed setup guide
STRATEGY-COMPARISON.md - Detailed comparison of all strategies

Testing

Test the local ONNX model:

cd models
deno run --allow-read --allow-env --allow-net test-local-model.ts

Run the full search harness:

cd ../testing
deno run --allow-read --allow-env --allow-net test-search.ts

API

Search Function

async function searchPages(
  query: string,
  pages: Page[],
  options?: SearchOptions
): Promise<SearchResult[]>

Options:

limit: Maximum results to return (default: 10)
minScore: Minimum similarity score 0-100 (default: 0)
enableTiming: Log timing breakdown (default: false)

Returns: Array of search results sorted by relevance

Generate Embeddings

async function generateEmbeddings(
  content: string
): Promise<number[] | null>

Returns: 384-dimensional embedding vector (or configured dimensions)

Architecture

Query
  ↓
Generate Query Embedding (10-30ms)
  ↓
Compare with Page Embeddings (cosine similarity, <1ms per page)
  ↓
Sort by Similarity
  ↓
Generate Snippets
  ↓
Return Results

Performance Tips

Use local ONNX models for production (fastest, most reliable)
Pre-calculate embeddings during recalculation (don't generate at query time)
Cache the pipeline (automatically done, but worth noting)
Use quantized models if memory is constrained (set USE_QUANTIZED = true)
Adjust minScore to filter low-quality results

Deployment

Deno Deploy / Render / Railway

✅ Use transformers-local-onnx.ts

Include model files in deployment
Fast, reliable, no network calls

Cloudflare Workers

✅ Use cloudflare-bge-cosine.ts

Workers have size limits (can't fit local models)
Cloudflare AI is optimized for Workers

Val.town

✅ Use transformers-cosine.ts

Isolate has caching for downloaded models
No persistent file system for pre-downloaded models

Docker

✅ Use transformers-local-onnx.ts

Include models in image:

COPY search/models/all-MiniLM-L6-v2 /app/search/models/all-MiniLM-L6-v2

Troubleshooting

"Failed to load local ONNX model"

Download the model:

cd search/models
./download-model.sh

"Module not found"

Check the import in search/index.ts:

import { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";

Slow performance

Check which strategy is active
Ensure model is cached (first run is slower)
Try quantized model (USE_QUANTIZED = true)
Check network latency (for API-based strategies)

Different embedding dimensions

If switching strategies with different dimensions:

GET /cache/recalculate?force=true

This regenerates all embeddings with the new strategy.

Contributing

To add a new search strategy:

Create search/my-strategy.ts

Implement SearchStrategy interface:

export const searchStrategy: SearchStrategy = {
  name: "my-strategy",
  description: "Description...",
  search: async (query, pages, options) => {
    // Implementation
  },
};

export const generateEmbeddings = async (content: string) => {
  // Generate embeddings
};

Document in STRATEGY-COMPARISON.md
Add to index.ts as an option

License

Part of the groq-docs project.