🛡️ SlimArmor — Beginner's Guide

Everything you need to go from zero to running semantic search in your own Val Town project.

What Are We Building?

A semantic search engine that understands meaning. Unlike normal search (which matches exact words), SlimArmor finds results based on what text means.

Real example:

You store: "The patient requires immediate surgery"
You search: "medical emergency"
It finds it — even though none of the words match ✅

This works because text is converted into embeddings — lists of numbers that capture meaning. Similar meanings produce similar numbers, so we can measure "how close" two pieces of text are in meaning-space.

Part 1: Deploy Your Instance

1.1 — Fork the val

Go to val.town/x/kamenxrider/slimarmor
Click Fork (top right)
Your own copy is now live!

1.2 — Choose an embedding provider

SlimArmor works with any OpenAI-compatible embedding API. You just need an API key and to know what dimensions your chosen model outputs.

💡 What are dimensions? When text is converted to an embedding, it becomes a list of numbers — the "dimension" is how many numbers long that list is. SlimArmor bakes this number into the database schema when it first runs, so you need to pick a model and stick with it. Changing models later requires a full reset.

Recommended options:

Provider	Model	Dimensions	Sign up
Nebius (default)	`Qwen/Qwen3-Embedding-8B`	4096	nebius.com — free tier
OpenAI	`text-embedding-3-small`	1536	platform.openai.com
OpenAI	`text-embedding-3-large`	3072	platform.openai.com
Any other	your choice	check docs	—

Higher dimensions = better quality but more storage used. For most use cases, any of the above work great.

Pick one, get your API key, and move on.

1.3 — Set environment variables

In your forked val on Val Town:

Click Settings (the gear icon)
Go to Environment Variables
Add these based on your chosen provider:

If using Nebius (default):

Key	Value
`NEBIUS_API_KEY`	Your Nebius API key

If using OpenAI:

Key	Value
`EMBEDDING_PROVIDER`	`openai`
`OPENAI_API_KEY`	Your OpenAI API key

If using any other OpenAI-compatible API:

Key	Value
`EMBEDDING_API_URL`	Your provider's `/v1/embeddings` URL
`EMBEDDING_API_KEY`	Your API key
`EMBEDDING_MODEL`	Your model name
`EMBEDDING_DIM`	Your model's output dimensions (e.g. `768`)

Always recommended:

Key	Value
`ADMIN_TOKEN`	Any secret string (e.g. `my-secret-123`)

ADMIN_TOKEN protects your write endpoints. Without it, anyone can add or delete your data.

1.4 — Find your endpoint URL

Click on api.ts in your val. At the top you'll see an endpoint URL like:

https://yourusername--abc123.web.val.run

Open that URL in a browser — you should see the API info page. That's your SlimArmor instance! 🎉

Part 2: Add Your First Data

Option A — Use the Browser CLI (easiest)

Visit https://YOUR_ENDPOINT/ui for a terminal-style interface.

Type help to see all commands. To add your first record:

auth your-admin-token
upsert my-first-note "The quick brown fox jumps over the lazy dog"

Then search:

search "animals jumping"

Option B — Use curl

# Replace with your actual values
ENDPOINT="https://YOUR_ENDPOINT"
TOKEN="your-admin-token"

# Add a record
curl -X POST $ENDPOINT/upsert \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"id": "note-1", "text": "The quick brown fox jumps over the lazy dog"}'

# Search
curl -X POST $ENDPOINT/search \
  -H "Content-Type: application/json" \
  -d '{"query": "animals jumping", "k": 5}'

Option C — From JavaScript/TypeScript

const ENDPOINT = "https://YOUR_ENDPOINT";
const TOKEN = "your-admin-token";

// Add a record
await fetch(`${ENDPOINT}/upsert`, {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": `Bearer ${TOKEN}`,
  },
  body: JSON.stringify({
    id: "note-1",
    text: "The quick brown fox jumps over the lazy dog",
    meta: { category: "example" },
  }),
});

// Search
const res = await fetch(`${ENDPOINT}/search`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ query: "animals jumping", k: 5 }),
});
const { results } = await res.json();
console.log(results);

Part 3: Understanding Search Results

A search result looks like this:

{
  "id": "note-1",
  "text": "The quick brown fox jumps over the lazy dog",
  "meta": { "category": "example" },
  "distance": 0.52
}

The key field is distance — it tells you how similar the result is to your query:

Distance	What it means	Should you include it?
0.0 – 0.3	Near-identical meaning	Always ✅
0.3 – 0.5	Very similar	Yes ✅
0.5 – 0.65	Related	Usually ✅
0.65 – 0.75	Loosely related	Maybe ⚠️
0.75+	Probably unrelated	No ❌

Setting a threshold

Use maxDistance to filter out weak matches:

{ "query": "animals jumping", "k": 10, "maxDistance": 0.65 }

Not sure what threshold to use? Use the calibrate endpoint:

GET /calibrate?q=your+search+query

It analyzes your actual data and recommends tight/balanced/loose thresholds.

Part 4: Common Use Cases

Personal note search

// Store notes
await fetch(`${ENDPOINT}/upsert`, {
  method: "POST",
  headers: { "Content-Type": "application/json", "Authorization": `Bearer ${TOKEN}` },
  body: JSON.stringify([
    { id: "note-2024-01", text: "Meeting with Sarah about Q4 budget planning", meta: { date: "2024-01", tag: "work" } },
    { id: "note-2024-02", text: "Research best frameworks for mobile development", meta: { date: "2024-01", tag: "tech" } },
    { id: "note-2024-03", text: "Book flight to Amsterdam for the conference", meta: { date: "2024-01", tag: "travel" } },
  ]),
});

// Find work-related notes
const res = await fetch(`${ENDPOINT}/search`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ query: "work meetings finance", k: 5, maxDistance: 0.65 }),
});

Document / knowledge base search

For long documents, use chunked upsert to split them first:

curl -X POST $ENDPOINT/upsert_chunked \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "id": "my-essay",
    "text": "...a very long piece of text...",
    "meta": { "source": "blog", "author": "Alice" },
    "chunkSize": 800,
    "overlap": 100
  }'

Each chunk is stored separately (my-essay::chunk1, my-essay::chunk2, etc.) and can be searched individually.

Filter by category

Use meta fields to organize data, then filter on search:

# Store records with categories
curl -X POST $ENDPOINT/upsert \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '[
    {"id": "r1", "text": "Python tutorial for beginners", "meta": {"type": "article"}},
    {"id": "r2", "text": "Python course on Coursera", "meta": {"type": "course"}},
    {"id": "r3", "text": "JavaScript for web development", "meta": {"type": "article"}}
  ]'

# Search only within articles
curl -X POST $ENDPOINT/search \
  -H "Content-Type: application/json" \
  -d '{"query": "learn programming", "k": 5, "filters": {"type": "article"}}'

Build a FAQ matcher

// Store FAQ pairs using the question as the text
await fetch(`${ENDPOINT}/upsert`, {
  method: "POST",
  headers: { "Content-Type": "application/json", "Authorization": `Bearer ${TOKEN}` },
  body: JSON.stringify([
    { id: "faq-1", text: "How do I reset my password?", meta: { answer: "Go to Settings → Security → Reset Password" } },
    { id: "faq-2", text: "How do I cancel my subscription?", meta: { answer: "Go to Billing → Cancel Plan" } },
    { id: "faq-3", text: "How do I contact support?", meta: { answer: "Email support@example.com" } },
  ]),
});

// When a user asks a question, find the closest FAQ
const userQuestion = "I forgot my password, what do I do?";
const res = await fetch(`${ENDPOINT}/search`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ query: userQuestion, k: 1, maxDistance: 0.6 }),
});
const { results } = await res.json();
if (results.length > 0) {
  console.log("Answer:", results[0].meta.answer);
}

Part 5: Import as a TypeScript Library

Instead of using the HTTP API, you can import SlimArmor's core directly into another val:

import * as db from "https://esm.town/v/kamenxrider/slimarmor/vectordb.ts";

export default async function handler(req: Request) {
  // Setup runs once per cold start (idempotent)
  await db.setup();

  const url = new URL(req.url);

  if (req.method === "POST" && url.pathname === "/add") {
    const { id, text } = await req.json();
    await db.upsert(id, text);
    return Response.json({ ok: true });
  }

  if (req.method === "POST" && url.pathname === "/find") {
    const { query } = await req.json();
    const results = await db.search(query, 5, 0.65);
    return Response.json({ results });
  }

  return new Response("Not found", { status: 404 });
}

The library uses your val's own SQLite database — you don't need to run the API separately. Just import and use.

Part 6: Tips & Best Practices

✅ Do

Use meaningful IDs — blog-post-2024-01 is better than 1
Keep text focused — shorter, topic-focused chunks search better than walls of text
Use metadata — store category, date, author, tags etc. so you can filter later
Calibrate your threshold — use /calibrate?q=... before going to production
Batch your upserts — send arrays of records instead of one at a time (much faster)

❌ Avoid

Storing empty or near-duplicate text — SlimArmor deduplicates by content hash, so identical text won't re-embed, but similar-but-different text will generate redundant embeddings
Deleting via raw SQL — always use POST /clear or POST /delete so the vector index stays in sync
Switching models without clearing — embeddings from different models are completely incompatible. A vector from model A is meaningless when compared to a vector from model B. Always export first, then clear, then re-import with the new model.

💡 Good to know

Deduplication is automatic — if you upsert the same id with the same text, it skips the embedding API call and only updates metadata. You can safely re-run upserts.
Hybrid search helps with specific terms — if your data has product codes, names, or exact terms, enable hybrid: { enabled: true } to boost keyword matches.
/validate is your friend — run it after setup to confirm everything is working before adding real data.

Part 7: Troubleshooting

"Unauthorized" on write operations

Make sure you're sending the header: Authorization: Bearer YOUR_ADMIN_TOKEN

In the browser CLI, type auth your-token first.

"Embedding API error 401"

Your API key is wrong or expired. Go to your val's Settings → Environment Variables and update NEBIUS_API_KEY (or whichever provider you're using).

Search returns unexpected results

Run /calibrate?q=your+query to see distance distributions
Try lowering maxDistance
Try enabling hybrid search: "hybrid": {"enabled": true}

"vector index(insert): failed to insert shadow row"

The DiskANN index got out of sync (happens if you manually deleted rows via SQL). Fix it with:

curl -X POST $ENDPOINT/reindex -H "Authorization: Bearer $TOKEN"

Inserts are slow

Normal — each batch of records requires one API call to the embedding provider (~460ms). For bulk imports, batch as many records as possible in each /upsert call (arrays of up to ~96 records per batch are ideal).

Quick Reference

ENDPOINT="https://YOUR_ENDPOINT"
TOKEN="your-admin-token"

# Health check
curl $ENDPOINT/ping

# View stats
curl $ENDPOINT/stats

# Add one record
curl -X POST $ENDPOINT/upsert -H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" \
  -d '{"id":"doc-1","text":"Your text here","meta":{"category":"notes"}}'

# Add many records
curl -X POST $ENDPOINT/upsert -H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" \
  -d '[{"id":"a","text":"first"},{"id":"b","text":"second"}]'

# Search
curl -X POST $ENDPOINT/search -H "Content-Type: application/json" \
  -d '{"query":"your query","k":10,"maxDistance":0.65}'

# Search with filter
curl -X POST $ENDPOINT/search -H "Content-Type: application/json" \
  -d '{"query":"your query","k":10,"filters":{"category":"notes"}}'

# Get a record
curl "$ENDPOINT/get?id=doc-1"

# List IDs
curl "$ENDPOINT/list?limit=20"

# Delete a record
curl -X POST $ENDPOINT/delete -H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" \
  -d '{"id":"doc-1"}'

# Calibrate threshold
curl "$ENDPOINT/calibrate?q=your+query"

# Seed test data
curl -H "Authorization: Bearer $TOKEN" "$ENDPOINT/seed?n=50"

# Export
curl -H "Authorization: Bearer $TOKEN" "$ENDPOINT/export?limit=500"

# Clear all (careful!)
curl -X POST "$ENDPOINT/clear?confirm=yes" -H "Authorization: Bearer $TOKEN"

# Rebuild index
curl -X POST $ENDPOINT/reindex -H "Authorization: Bearer $TOKEN"

Happy searching! 🔍 If you get stuck, open the /ui browser CLI and type help.