Version: 4.1
Status: Production-ready
Last Updated: 2026-03-02
This document is for developers (or AI assistants) continuing work on SlimArmor. It covers architecture, design decisions, known limitations, and what to do next.
SlimArmor is a self-hosted vector database for Val Town. It runs entirely on Val Town's built-in SQLite (powered by Turso/libSQL), which has native vector extensions (DiskANN index, F32_BLOB columns, vector_top_k). No external database is required.
The system consists of:
vectordb.ts— the core library (importable by other vals)api.ts— the HTTP API layerui.ts— the browser CLI served at/ui
Browser CLI (ui.ts)
│
▼
HTTP API (api.ts)
- Route matching
- Input validation
- Auth (bearer token)
- Error handling
│
▼
Vector DB Core (vectordb.ts)
- setup() Creates tables + DiskANN index (idempotent, guarded)
- upsert() Single record: hash check → embed → write
- upsertMany() Batch: hash check → batch embed → sqlite.batch() write
- search() embed query → vector_top_k → optional filter → optional hybrid
- remove() DELETE + FTS sync
- stats() COUNT + storage estimate
- listIds() Paginated ID list
- get() Single record fetch
- reindex() DROP + CREATE INDEX
│ │
▼ ▼
Embedding API Val Town SQLite (libSQL)
(OpenAI-compat) - vectordb table (F32_BLOB)
batch size: 96 - vectordb_embedding_idx (DiskANN)
timeout: 30s - vectordb_fts (FTS5)
- vectordb_meta
Every record stores a SHA-256 hash of its text. On upsert, we compare hashes — if unchanged, we skip the embedding API call and only update metadata. This saves real money at scale (embedding APIs charge per token).
upsertMany previously looped with individual sqlite.execute() calls. Now it builds arrays of statements and calls sqlite.batch([...]) once. This reduces write latency by ~3× for large batches. The FTS DELETE + INSERT pairs are included in the same batch.
setup() was called on every non-UI request. Even though CREATE TABLE IF NOT EXISTS is idempotent, it still hits SQLite 3–5 times. The guard flag makes subsequent warm calls a no-op (instant return). Resets on cold start — which is fine, since setup only needs to run once per process lifetime.
FTS5 support is detected at runtime and cached in FTS_AVAILABLE. If unavailable, hybrid search gracefully falls back to pure vector. This keeps the core robust across environments.
The index is created with these non-default settings for storage efficiency:
max_neighbors=64— reduces from ~192 default; saves index storagecompress_neighbors=float8— 75% compression vs float32; small accuracy tradeoffmetric=cosine— appropriate for text embeddings
These can all be overridden via env vars and applied with POST /reindex.
When set, the first request probes the embedding API with a test string to detect the dimension, then stores it in vectordb_meta. Subsequent requests read from there. This avoids schema mismatches when switching models.
CREATE TABLE vectordb_meta (
key TEXT PRIMARY KEY,
value TEXT
);
CREATE TABLE vectordb (
id TEXT PRIMARY KEY,
text TEXT NOT NULL,
text_hash TEXT NOT NULL, -- SHA-256 of text content
embedding F32_BLOB(4096), -- actual dim varies by provider
meta_json TEXT, -- JSON blob, nullable
updated_at INTEGER NOT NULL -- Unix ms
);
CREATE INDEX vectordb_embedding_idx
ON vectordb (libsql_vector_idx(embedding,
'metric=cosine',
'max_neighbors=64',
'compress_neighbors=float8'
));
CREATE VIRTUAL TABLE vectordb_fts USING fts5(id, text);
Shadow tables created automatically by libSQL:
vectordb_embedding_idx_shadowvectordb_embedding_idx_shadow_idxlibsql_vector_meta_shadowvectordb_fts_config,vectordb_fts_content,vectordb_fts_data,vectordb_fts_docsize,vectordb_fts_idx
If you DELETE FROM vectordb via raw SQL, the DiskANN shadow tables go out of sync. Subsequent inserts will fail with:
SQLITE_UNKNOWN: SQLite error: vector index(insert): failed to insert shadow row
Fix: Call POST /reindex to drop and recreate the index.
Prevention: Always use POST /clear?confirm=yes which handles both the main table and FTS atomically.
SlimArmor is provider-agnostic — any OpenAI-compatible embedding API works. The provider selection is purely a convenience layer for pre-filling API URLs and default model names. What actually matters:
- Protocol:
POST /v1/embeddingswith body{"model": "...", "input": ["text1", "text2", ...]} - Response: standard OpenAI format with
data[].embedding(array of floats) anddata[].index - Dimensions: the length of each returned embedding array — this is the critical value. It's locked into the
F32_BLOB(N)column type at table creation time and cannot be changed without a full DB reset - Batch size: 96 texts per API call (OpenAI's documented limit — conservative enough for all compatible providers)
- Timeout: 30 seconds (AbortController)
- Response parsing: indexed by
item.indexto handle out-of-order responses correctly - Dim assertion: all returned vectors are checked against the expected dimension and will throw on mismatch
- If
EMBEDDING_DIMis set to a number → use that value directly - If
EMBEDDING_DIM=auto→ probe the API with a test string, store result invectordb_meta, reuse on subsequent calls - If
EMBEDDING_DIMis unset → use the provider preset default (e.g. 4096 for Nebius, 1536 for OpenAI)
Once the table is created, RESOLVED_DIM is cached in memory and also persisted in vectordb_meta so it survives cold starts.
- If
ADMIN_TOKENenv var is not set: open mode — all operations allowed - If
ADMIN_TOKENis set: write operations requireAuthorization: Bearer <token>header - Read operations (
/search,/get,/list,/stats,/ping,/calibrate) are always public - The browser CLI stores the token in
sessionStorage(cleared on tab close)
These variables live at module scope in vectordb.ts and reset on cold start:
| Variable | Purpose | Reset on cold start? |
|---|---|---|
RESOLVED_DIM | Cached embedding dimension | Yes — re-read from vectordb_meta |
FTS_AVAILABLE | Whether FTS5 is usable | Yes — re-detected on first call |
_setupDone | Setup guard flag | Yes — setup re-runs once per process |
Cold starts in Val Town happen frequently. These caches only help within a warm invocation window, which is fine — they prevent redundant work within a single request chain, not across requests.
The table schema hardcodes the vector dimension at creation time (F32_BLOB(4096)). Changing providers or models requires:
POST /exportto save text + metaPOST /clear?confirm=yesto wipe the DB- Update env vars
POST /importto re-embed everything
EMBEDDING_DIM=auto helps by detecting the dimension dynamically, but once the table exists you can't change it without a full reset.
The built-in chunkText() function splits on character count with a whitespace-finding heuristic. It doesn't respect sentence boundaries, paragraphs, or semantic units. For production RAG use cases, consider pre-chunking text before inserting.
The hybrid mode re-ranks vector results using BM25 keyword scores. It does not perform a full union of vector + keyword candidates. Records that are keyword matches but outside the vector top-K are not surfaced.
Pagination is offset-based (offset param on /list and /search). For large datasets with frequent inserts, results can shift between pages.
All records share one vectordb table. There's no multi-tenant or namespaced storage. If you need logical separation, use the prefix param on /list and metadata filters on /search, or fork and deploy separate instances.
Measured with Nebius Qwen/Qwen3-Embedding-8B (4096 dims), 105 records:
| Operation | Latency |
|---|---|
| Embed 10 records (1 batch) | ~1.2s |
upsertMany 10 records | ~1.4s total |
search (vector only) | <100ms |
search (hybrid) | ~150ms |
setup() cold (first call) | ~200ms |
setup() warm (guarded) | <1ms |
Storage:
- ~22 KB per record (4096 dims × 4 bytes + float8 compression + text + overhead)
- ~47,500 records per 1 GB
Prioritized by impact:
- Namespace/collection support — partition records by collection name (table prefix or extra column + index)
- Hybrid union retrieval — merge vector candidates and keyword candidates before ranking
- Async embed queue — background embedding via interval val, for non-blocking imports
- Sentence-aware chunking — use sentence boundaries for better chunk quality
- Cursor pagination — stable pagination using
updated_at+idcursor - Webhook on upsert — fire a webhook after batch upserts complete
- Multi-index — different embedding models in same DB
- Range filters —
meta.date > "2024-01-01"style filtering - Delete by filter — delete all records matching metadata criteria
- Rate limiting — per-IP limits on search endpoint
| Variable | Default | Description |
|---|---|---|
ADMIN_TOKEN | — | Enables auth for write ops |
EMBEDDING_PROVIDER | nebius | Provider preset: nebius, openai, openrouter |
NEBIUS_API_KEY | — | Nebius key (used when provider=nebius) |
OPENAI_API_KEY | — | OpenAI key (used when provider=openai) |
OPENROUTER_API_KEY | — | OpenRouter key (used when provider=openrouter) |
EMBEDDING_API_URL | (preset) | Override API URL |
EMBEDDING_API_KEY | — | Generic fallback key |
EMBEDDING_MODEL | (preset) | Override model name |
EMBEDDING_DIM | (preset) | Override dimensions, or auto |
INDEX_METRIC | cosine | cosine or l2 |
INDEX_MAX_NEIGHBORS | 64 | Graph degree (8–256) |
INDEX_COMPRESS_NEIGHBORS | float8 | float8, float16, floatb16, float32, float1bit, none |
INDEX_ALPHA | 1.2 | DiskANN density (≥1) |
INDEX_SEARCH_L | 200 | Query-time effort |
INDEX_INSERT_L | 70 | Insert-time effort |
ALLOW_WRITE_TESTS | 0 | Enable /validate?write=yes |
ALLOW_WRITE_TESTS_NOAUTH | 0 | Skip auth for write tests |
- Val: https://www.val.town/x/kamenxrider/slimarmor
- API: https://kamenxrider--95fbe492ffe111f0bee942dde27851f2.web.val.run
- Browser CLI: https://kamenxrider--95fbe492ffe111f0bee942dde27851f2.web.val.run/ui
- Module import:
https://esm.town/v/kamenxrider/slimarmor/vectordb.ts