Search
Code165
// Mixedbread Stores Strategy: Managed AI Search Service// Uses Mixedbread's Store API for document storage and semantic search// No local embeddings needed - Mixedbread handles everythingimport type { SearchStrategy, SearchResult, Page, SearchOptions } from "./types.ts";// This function is required for compatibility with the recalculation system// For Mixedbread managed Store, embeddings are handled by Mixedbread internally// Return a dummy array to satisfy the recalculation script// The actual embeddings are generated and stored by Mixedbread when documents are uploadedexport const generateEmbeddings = async (_content: string): Promise<number[] | null> => { // Return a dummy embedding array to satisfy recalculation // The recalculation script requires this, but for Mixedbread Store, // you should use `deno task recalc-mxbai` instead, which uploads docs to Mixedbread return [0]; // Dummy value - actual embeddings are handled by Mixedbread Store};export const searchStrategy: SearchStrategy = { name: "mixedbread", description: "Managed AI search using Mixedbread Stores (handles storage, embeddings, and search)", search: async (query: string, _pages: Page[], options: SearchOptions = {}): Promise<SearchResult[]> => { const limit = options.limit || 10;```typescriptimport { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";```| **transformers-local-onnx** ⭐ | ~60-80ms | Free | Download model | Production || **transformers-cosine** | ~160-180ms | Free | None (auto-download) | Development || **mixedbread-embeddings** | ~50-100ms | Free tier | API key | High accuracy || **openai-cosine** | ~100-200ms | Paid | API key | Reliability || **hf-inference-qwen3** | ~150-300ms | Free tier | API key | Best accuracy |- **`transformers-local-onnx.ts`** - Local ONNX models (fastest, recommended)- **`transformers-cosine.ts`** - Auto-download ONNX models- **`mixedbread-embeddings-cosine.ts`** - Mixedbread API + local cosine- **`openai-cosine.ts`** - OpenAI embeddings + local cosine- **`hf-inference-qwen3-cosine.ts`** - HuggingFace Qwen3-8B embeddings- **`cloudflare-bge-cosine.ts`** - Cloudflare Workers AI- **`jigsawstack-orama.ts`** - JigsawStack managed search- **`mixedbread.ts`** - Mixedbread Stores (managed)- **`placeholder.ts`** - Fake embeddings for testing## Documentation**Returns**: Array of search results sorted by relevance### Generate Embeddings```typescriptasync function generateEmbeddings( content: string): Promise<number[] | null>Generate Query Embedding (10-30ms) ↓Compare with Page Embeddings (cosine similarity, <1ms per page) ↓Sort by Similarity1. **Use local ONNX models** for production (fastest, most reliable)2. **Pre-calculate embeddings** during recalculation (don't generate at query time)3. **Cache the pipeline** (automatically done, but worth noting)4. **Use quantized models** if memory is constrained (set `USE_QUANTIZED = true`)Check the import in `search/index.ts`:```typescriptimport { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";``````This regenerates all embeddings with the new strategy.## Contributing }; export const generateEmbeddings = async (content: string) => { // Generate embeddings }; ``````You can then use the model to compute embeddings like this:```jsconst extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');// Compute sentence embeddingsconst sentences = ['This is an example sentence', 'Each sentence is converted'];const output = await extractor(sentences, { pooling: 'mean', normalize: true }); "intermediate_size": 1536, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, echo "Next steps:" echo "1. Update search/index.ts to use the local ONNX strategy:" echo " import { searchStrategy, generateEmbeddings } from \"./transformers-local-onnx.ts\";" echo "" echo "2. Run your application - the model will load from local files!"```typescriptimport { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";```// Run with: deno run --allow-read --allow-env --allow-net --allow-ffi test-server-mode.tsimport { searchStrategy, generateEmbeddings } from "../transformers-local-onnx.ts";console.log("🧪 Testing ONNX in Server Mode (long-running process)\n"); const start = performance.now(); const embedding = await generateEmbeddings(query); const elapsed = performance.now() - start; // Run with: deno run --allow-read --allow-env --allow-net test-local-model.tsimport { searchStrategy, generateEmbeddings } from "../transformers-local-onnx.ts";console.log("🧪 Testing Local ONNX Model Strategy\n");// Test 1: Generate embeddings for a simple queryconsole.log("Test 1: Generate embeddings for a query");console.log("Query: 'What is Groq?'\n");const start = performance.now();const embeddings = await generateEmbeddings("What is Groq?");const elapsed = performance.now() - start;if (embeddings) { console.log(`✅ Generated embeddings successfully!`); console.log(` Dimensions: ${embeddings.length}`); console.log(` First 5 values: [${embeddings.slice(0, 5).map(v => v.toFixed(4)).join(", ")}...]`); console.log(` Time: ${elapsed.toFixed(2)}ms`);} else { console.log(`❌ Failed to generate embeddings`);}console.log("\n" + "=".repeat(60) + "\n");// Test 2: Generate embeddings for multiple queries (to test caching)console.log("Test 2: Generate embeddings for multiple queries (testing cache)");const queries = [for (const query of queries) { const queryStart = performance.now(); const queryEmbedding = await generateEmbeddings(query); const queryElapsed = performance.now() - queryStart; title: "Introduction to Groq", content: "Groq is a fast AI inference platform that provides APIs for various language models.", embeddings: await generateEmbeddings("Groq is a fast AI inference platform that provides APIs for various language models."), }, { title: "API Keys", content: "Learn how to create and manage your Groq API keys for authentication.", embeddings: await generateEmbeddings("Learn how to create and manage your Groq API keys for authentication."), }, { title: "Available Models", content: "Groq supports various language models including Llama, Mixtral, and Gemma.", embeddings: await generateEmbeddings("Groq supports various language models including Llama, Mixtral, and Gemma."), },];```typescriptimport { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";``````typescript// import { searchStrategy, generateEmbeddings } from "./transformers-cosine.ts";``````typescript// Comment out the current strategy// import { searchStrategy, generateEmbeddings } from "./transformers-cosine.ts";// Uncomment the local ONNX strategyimport { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";``` total model load: 239ms✅ Generated embeddings successfully! Dimensions: 384 First 5 values: [-0.0457, -0.0109, -0.0935, ...] Time: 247msTest 2: Generate embeddings for multiple queries (testing cache)✅ "How to use Groq API?" Time: 3.87ms (cached pipeline)**Fix**: Check `search/index.ts` has the correct import:```typescriptimport { searchStrategy, generateEmbeddings } from "./transformers-local-onnx.ts";```- Slightly less accurate: ~57 vs ~58 MTEB score### Optional: Recalculate EmbeddingsIf you were using a different strategy before, regenerate embeddings:```bash```This ensures all page embeddings use the same model.## Performance Comparison