SlimArmor — Technical Handover

Version: 4.1
Status: Production-ready
Last Updated: 2026-03-02

This document is for developers (or AI assistants) continuing work on SlimArmor. It covers architecture, design decisions, known limitations, and what to do next.

What This Is

SlimArmor is a self-hosted vector database for Val Town. It runs entirely on Val Town's built-in SQLite (powered by Turso/libSQL), which has native vector extensions (DiskANN index, F32_BLOB columns, vector_top_k). No external database is required.

The system consists of:

vectordb.ts — the core library (importable by other vals)
api.ts — the HTTP API layer
ui.ts — the browser CLI served at /ui

Architecture

Browser CLI (ui.ts)
      │
      ▼
HTTP API (api.ts)
  - Route matching
  - Input validation
  - Auth (bearer token)
  - Error handling
      │
      ▼
Vector DB Core (vectordb.ts)
  - setup()         Creates tables + DiskANN index (idempotent, guarded)
  - upsert()        Single record: hash check → embed → write
  - upsertMany()    Batch: hash check → batch embed → sqlite.batch() write
  - search()        embed query → vector_top_k → optional filter → optional hybrid
  - remove()        DELETE + FTS sync
  - stats()         COUNT + storage estimate
  - listIds()       Paginated ID list
  - get()           Single record fetch
  - reindex()       DROP + CREATE INDEX
      │                          │
      ▼                          ▼
Embedding API          Val Town SQLite (libSQL)
(OpenAI-compat)         - vectordb table (F32_BLOB)
  batch size: 96         - vectordb_embedding_idx (DiskANN)
  timeout: 30s           - vectordb_fts (FTS5)
                         - vectordb_meta

Key Design Decisions

1. Content-hash deduplication

Every record stores a SHA-256 hash of its text. On upsert, we compare hashes — if unchanged, we skip the embedding API call and only update metadata. This saves real money at scale (embedding APIs charge per token).

2. `sqlite.batch()` for bulk writes (v4.1)

upsertMany previously looped with individual sqlite.execute() calls. Now it builds arrays of statements and calls sqlite.batch([...]) once. This reduces write latency by ~3× for large batches. The FTS DELETE + INSERT pairs are included in the same batch.

3. `_setupDone` module-level guard (v4.1)

setup() was called on every non-UI request. Even though CREATE TABLE IF NOT EXISTS is idempotent, it still hits SQLite 3–5 times. The guard flag makes subsequent warm calls a no-op (instant return). Resets on cold start — which is fine, since setup only needs to run once per process lifetime.

4. FTS5 as optional enhancement

FTS5 support is detected at runtime and cached in FTS_AVAILABLE. If unavailable, hybrid search gracefully falls back to pure vector. This keeps the core robust across environments.

5. DiskANN index configuration

The index is created with these non-default settings for storage efficiency:

max_neighbors=64 — reduces from ~192 default; saves index storage
compress_neighbors=float8 — 75% compression vs float32; small accuracy tradeoff
metric=cosine — appropriate for text embeddings

These can all be overridden via env vars and applied with POST /reindex.

6. `EMBEDDING_DIM=auto`

When set, the first request probes the embedding API with a test string to detect the dimension, then stores it in vectordb_meta. Subsequent requests read from there. This avoids schema mismatches when switching models.

Database Schema (as created)

CREATE TABLE vectordb_meta (
  key   TEXT PRIMARY KEY,
  value TEXT
);

CREATE TABLE vectordb (
  id         TEXT PRIMARY KEY,
  text       TEXT NOT NULL,
  text_hash  TEXT NOT NULL,       -- SHA-256 of text content
  embedding  F32_BLOB(4096),      -- actual dim varies by provider
  meta_json  TEXT,                -- JSON blob, nullable
  updated_at INTEGER NOT NULL     -- Unix ms
);

CREATE INDEX vectordb_embedding_idx
ON vectordb (libsql_vector_idx(embedding,
  'metric=cosine',
  'max_neighbors=64',
  'compress_neighbors=float8'
));

CREATE VIRTUAL TABLE vectordb_fts USING fts5(id, text);

Shadow tables created automatically by libSQL:

vectordb_embedding_idx_shadow
vectordb_embedding_idx_shadow_idx
libsql_vector_meta_shadow
vectordb_fts_config, vectordb_fts_content, vectordb_fts_data, vectordb_fts_docsize, vectordb_fts_idx

Critical Warning: Don't Manually Delete Rows

If you DELETE FROM vectordb via raw SQL, the DiskANN shadow tables go out of sync. Subsequent inserts will fail with:

SQLITE_UNKNOWN: SQLite error: vector index(insert): failed to insert shadow row

Fix: Call POST /reindex to drop and recreate the index.

Prevention: Always use POST /clear?confirm=yes which handles both the main table and FTS atomically.

Embedding API Details

SlimArmor is provider-agnostic — any OpenAI-compatible embedding API works. The provider selection is purely a convenience layer for pre-filling API URLs and default model names. What actually matters:

Protocol: POST /v1/embeddings with body {"model": "...", "input": ["text1", "text2", ...]}
Response: standard OpenAI format with data[].embedding (array of floats) and data[].index
Dimensions: the length of each returned embedding array — this is the critical value. It's locked into the F32_BLOB(N) column type at table creation time and cannot be changed without a full DB reset
Batch size: 96 texts per API call (OpenAI's documented limit — conservative enough for all compatible providers)
Timeout: 30 seconds (AbortController)
Response parsing: indexed by item.index to handle out-of-order responses correctly
Dim assertion: all returned vectors are checked against the expected dimension and will throw on mismatch

Dimension detection flow

If EMBEDDING_DIM is set to a number → use that value directly
If EMBEDDING_DIM=auto → probe the API with a test string, store result in vectordb_meta, reuse on subsequent calls
If EMBEDDING_DIM is unset → use the provider preset default (e.g. 4096 for Nebius, 1536 for OpenAI)

Once the table is created, RESOLVED_DIM is cached in memory and also persisted in vectordb_meta so it survives cold starts.

Auth Model

If ADMIN_TOKEN env var is not set: open mode — all operations allowed
If ADMIN_TOKEN is set: write operations require Authorization: Bearer <token> header
Read operations (/search, /get, /list, /stats, /ping, /calibrate) are always public
The browser CLI stores the token in sessionStorage (cleared on tab close)

Module-Level State

These variables live at module scope in vectordb.ts and reset on cold start:

Variable	Purpose	Reset on cold start?
`RESOLVED_DIM`	Cached embedding dimension	Yes — re-read from `vectordb_meta`
`FTS_AVAILABLE`	Whether FTS5 is usable	Yes — re-detected on first call
`_setupDone`	Setup guard flag	Yes — setup re-runs once per process

Cold starts in Val Town happen frequently. These caches only help within a warm invocation window, which is fine — they prevent redundant work within a single request chain, not across requests.

Known Limitations

1. Fixed embedding dimension per database

The table schema hardcodes the vector dimension at creation time (F32_BLOB(4096)). Changing providers or models requires:

POST /export to save text + meta
POST /clear?confirm=yes to wipe the DB
Update env vars
POST /import to re-embed everything

EMBEDDING_DIM=auto helps by detecting the dimension dynamically, but once the table exists you can't change it without a full reset.

2. Naive chunking

The built-in chunkText() function splits on character count with a whitespace-finding heuristic. It doesn't respect sentence boundaries, paragraphs, or semantic units. For production RAG use cases, consider pre-chunking text before inserting.

3. Hybrid search is re-ranking only

The hybrid mode re-ranks vector results using BM25 keyword scores. It does not perform a full union of vector + keyword candidates. Records that are keyword matches but outside the vector top-K are not surfaced.

4. No cursor-based pagination

Pagination is offset-based (offset param on /list and /search). For large datasets with frequent inserts, results can shift between pages.

5. Single table / namespace

All records share one vectordb table. There's no multi-tenant or namespaced storage. If you need logical separation, use the prefix param on /list and metadata filters on /search, or fork and deploy separate instances.

Performance Benchmarks

Measured with Nebius Qwen/Qwen3-Embedding-8B (4096 dims), 105 records:

Operation	Latency
Embed 10 records (1 batch)	~1.2s
`upsertMany` 10 records	~1.4s total
`search` (vector only)	<100ms
`search` (hybrid)	~150ms
`setup()` cold (first call)	~200ms
`setup()` warm (guarded)	<1ms

Storage:

~22 KB per record (4096 dims × 4 bytes + float8 compression + text + overhead)
~47,500 records per 1 GB

Future Improvements

Prioritized by impact:

High value

Namespace/collection support — partition records by collection name (table prefix or extra column + index)
Hybrid union retrieval — merge vector candidates and keyword candidates before ranking
Async embed queue — background embedding via interval val, for non-blocking imports

Medium value

Sentence-aware chunking — use sentence boundaries for better chunk quality
Cursor pagination — stable pagination using updated_at + id cursor
Webhook on upsert — fire a webhook after batch upserts complete

Low value / nice to have

Multi-index — different embedding models in same DB
Range filters — meta.date > "2024-01-01" style filtering
Delete by filter — delete all records matching metadata criteria
Rate limiting — per-IP limits on search endpoint

Environment Variables (Complete Reference)

Variable	Default	Description
`ADMIN_TOKEN`	—	Enables auth for write ops
`EMBEDDING_PROVIDER`	`nebius`	Provider preset: `nebius`, `openai`, `openrouter`
`NEBIUS_API_KEY`	—	Nebius key (used when provider=nebius)
`OPENAI_API_KEY`	—	OpenAI key (used when provider=openai)
`OPENROUTER_API_KEY`	—	OpenRouter key (used when provider=openrouter)
`EMBEDDING_API_URL`	(preset)	Override API URL
`EMBEDDING_API_KEY`	—	Generic fallback key
`EMBEDDING_MODEL`	(preset)	Override model name
`EMBEDDING_DIM`	(preset)	Override dimensions, or `auto`
`INDEX_METRIC`	`cosine`	`cosine` or `l2`
`INDEX_MAX_NEIGHBORS`	`64`	Graph degree (8–256)
`INDEX_COMPRESS_NEIGHBORS`	`float8`	`float8`, `float16`, `floatb16`, `float32`, `float1bit`, `none`
`INDEX_ALPHA`	`1.2`	DiskANN density (≥1)
`INDEX_SEARCH_L`	`200`	Query-time effort
`INDEX_INSERT_L`	`70`	Insert-time effort
`ALLOW_WRITE_TESTS`	`0`	Enable `/validate?write=yes`
`ALLOW_WRITE_TESTS_NOAUTH`	`0`	Skip auth for write tests