Readme

Part of Val Town Semantic Search.

Uses Val Town's blob storage to search embeddings of all vals, by downloading them all and iterating through all of them to compute distance. Slow and terrible, but it works!

  • Get metadata from blob storage: allValsBlob${dimensions}EmbeddingsMeta (currently allValsBlob1536EmbeddingsMeta), which has a list of all indexed vals and where their embedding is stored (batchDataIndex points to the blob, and valIndex represents the offset within the blob).
  • Get all blobs with embeddings pointed to by the metadata, e.g. allValsBlob1536EmbeddingsData_0 for batchDataIndex 0.
  • Call OpenAI to generate an embedding for the search query.
  • Go through all embeddings and compute cosine similarity with the embedding for the search query.
  • Return list sorted by similarity.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import { decode as base64Decode, encode as base64Encode } from "https://deno.land/std@0.166.0/encoding/base64.ts";
import { createClient } from "https://esm.sh/@libsql/client@0.6.0/web";
import { sqlToJSON } from "https://esm.town/v/nbbaier/sqliteExportHelpers?v=22";
import { db as allValsDb } from "https://esm.town/v/sqlite/db?v=9";
import { blob } from "https://esm.town/v/std/blob";
import cosSimilarity from "npm:cos-similarity";
import _ from "npm:lodash";
import OpenAI from "npm:openai";
const dimensions = 1536;
export default async function semanticSearchPublicVals(query) {
const allValsBlobEmbeddingsMeta = (await blob.getJSON(`allValsBlob${dimensions}EmbeddingsMeta`)) ?? {};
const allBatchDataIndexes = _.uniq(Object.values(allValsBlobEmbeddingsMeta).map((item: any) => item.batchDataIndex));
const embeddingsBatches = [];
const allBatchDataIndexesPromises = [];
for (const batchDataIndex of allBatchDataIndexes) {
const embeddingsBatchBlobName = `allValsBlob${dimensions}EmbeddingsData_${batchDataIndex}`;
const promise = blob.get(embeddingsBatchBlobName).then((response) => response.arrayBuffer());
promise.then((data) => {
embeddingsBatches[batchDataIndex as any] = data;
console.log(`Loaded ${embeddingsBatchBlobName} (${data.byteLength} bytes)`);
});
allBatchDataIndexesPromises.push(promise);
}
await Promise.all(allBatchDataIndexesPromises);
const openai = new OpenAI();
const queryEmbedding = (await openai.embeddings.create({
model: "text-embedding-3-small",
input: query,
dimensions: dimensions,
})).data[0].embedding;
const res = [];
for (const id in allValsBlobEmbeddingsMeta) {
const meta = allValsBlobEmbeddingsMeta[id];
const embedding = new Float32Array(
embeddingsBatches[meta.batchDataIndex],
dimensions * 4 * meta.valIndex,
dimensions,
);
const [author_username, name, version] = id.split("!!");
res.push({ author_username, name, version, similarity: cosSimilarity(embedding as any, queryEmbedding) });
}
res.sort((a, b) => b.similarity - a.similarity);
console.log(`Processed ${res.length} records`);
return res.slice(0, 50);
}
const exampleQuery = "check dynamicland website for changes and email me";
console.log(await semanticSearchPublicVals(exampleQuery));
Val Town is a social website to write and deploy JavaScript.
Build APIs and schedule functions from your browser.
Comments
Nobody has commented on this val yet: be the first!
May 30, 2024