πŸ€– Build a RAG Chatbot with SlimArmor + Val Town's Free OpenAI Proxy

A complete, copy-paste guide to building a chatbot that answers questions from your own documents β€” running entirely on Val Town, with no LLM API costs.


What You'll Build

A chatbot val that:

  • Remembers your documents (support articles, notes, product docs, anything)
  • Answers questions based on what's in those documents
  • Maintains conversation history (multi-turn chat β€” it remembers what was said earlier in the thread)
  • Has a real chat UI in the browser
  • Costs nothing for the LLM (thanks to Val Town's free OpenAI proxy)
  • Costs nothing to host (it's a Val Town val)

The only thing you pay for is the embedding API (Nebius has a generous free tier).


How RAG Works (2 minute explainer)

LLMs like GPT-4 are powerful but they only know what was in their training data. They don't know your support docs, your internal wiki, or anything you wrote after their cutoff date.

RAG (Retrieval-Augmented Generation) fixes this by adding a memory layer:

Without RAG:
  User: "What's your refund policy?"
  LLM: "I don't have access to that information."

With RAG:
  User: "What's your refund policy?"
    β†’ Search SlimArmor for relevant docs
    β†’ Find: "Refunds available within 30 days..."
    β†’ Inject into LLM prompt as context
  LLM: "Based on our policy, you can get a refund within 30 days of purchase."

The LLM doesn't need to "know" your docs β€” it just needs to read them at the moment of answering. SlimArmor handles finding the right docs. Val Town's free OpenAI proxy handles the answering.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     question      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    User     β”‚ ────────────────▢ β”‚                  β”‚
β”‚  (browser)  β”‚                   β”‚   chatbot.ts     β”‚
β”‚             β”‚ ◀──────────────── β”‚   (your val)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      answer       β”‚                  β”‚
                                  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
                                         β”‚     β”‚
                              search     β”‚     β”‚  prompt +
                              docs       β”‚     β”‚  context
                                         β–Ό     β–Ό
                                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
                                  β”‚Slim-   β”‚ β”‚Val Townβ”‚
                                  β”‚Armor   β”‚ β”‚OpenAI  β”‚
                                  β”‚(memory)β”‚ β”‚(free)  β”‚
                                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Prerequisites

  1. A SlimArmor instance β€” fork kamenxrider/slimarmor and set your NEBIUS_API_KEY (see GUIDE.md Part 1)
  2. A Val Town account β€” free tier works fine for the chatbot val itself
  3. That's it β€” the LLM is free via Val Town's built-in proxy

Val Town Pro note: The free OpenAI proxy gives you unlimited access to gpt-4o-mini and gpt-4.1-nano. Pro users also get 10 requests/day to more powerful models. For a chatbot this is plenty.


Step 1 β€” Ingest Your Documents

Before the chatbot can answer questions, you need to load your content into SlimArmor. Do this once (and re-run whenever your docs change β€” unchanged content is skipped automatically).

Create a new Script val called ingest-docs and run it:

// ingest-docs.ts // Run this once to load your documents into SlimArmor. // Re-run anytime your content changes β€” unchanged docs are skipped for free. const SLIMARMOR_ENDPOINT = "https://YOUR_SLIMARMOR_ENDPOINT"; const SLIMARMOR_TOKEN = Deno.env.get("SLIMARMOR_TOKEN") ?? ""; // ─── PUT YOUR CONTENT HERE ─────────────────────────────────────────────────── // For short content (paragraphs, FAQs, articles under ~500 words): use /upsert // For long content (full docs, blog posts, manuals): use /upsert_chunked const docs = [ { id: "faq-returns", text: "Our return policy allows returns within 30 days of purchase for unused items in original packaging. Digital products and gift cards are non-refundable. To start a return, email returns@example.com with your order number.", meta: { category: "policy", topic: "returns" }, }, { id: "faq-shipping", text: "We ship to over 50 countries. Standard shipping takes 5–7 business days and costs $4.99. Express shipping (2–3 days) costs $12.99. Orders over $75 qualify for free standard shipping. Tracking numbers are emailed within 24 hours of dispatch.", meta: { category: "policy", topic: "shipping" }, }, { id: "faq-account", text: "To create an account, click Sign Up on the homepage and enter your email and a password. You'll receive a verification email β€” click the link to activate. You can reset your password anytime from the login page.", meta: { category: "account", topic: "setup" }, }, { id: "faq-pricing", text: "We offer three plans: Free (up to 3 projects, 1 GB storage), Pro ($12/month, unlimited projects, 50 GB storage, priority support), and Enterprise (custom pricing, SSO, dedicated support). All paid plans include a 14-day free trial.", meta: { category: "pricing", topic: "plans" }, }, ]; const res = await fetch(`${SLIMARMOR_ENDPOINT}/upsert`, { method: "POST", headers: { "Content-Type": "application/json", "Authorization": `Bearer ${SLIMARMOR_TOKEN}`, }, body: JSON.stringify(docs), }); const result = await res.json(); console.log(`βœ… Done: ${result.embedded} embedded, ${result.skipped} skipped (unchanged)`);

For longer documents (blog posts, manuals, full articles), use /upsert_chunked instead β€” it automatically splits text into overlapping chunks so each chunk is searchable:

// For a long document β€” replace the fetch above with this: const longDoc = `...your full document text here...`; const res = await fetch(`${SLIMARMOR_ENDPOINT}/upsert_chunked`, { method: "POST", headers: { "Content-Type": "application/json", "Authorization": `Bearer ${SLIMARMOR_TOKEN}`, }, body: JSON.stringify({ id: "docs-user-manual", text: longDoc, meta: { source: "user-manual", version: "2.1" }, chunkSize: 600, // ~600 chars per chunk overlap: 80, // 80 chars of overlap between chunks (preserves context at boundaries) }), });

Step 2 β€” Build the Chatbot Val

Create a new HTTP val called my-chatbot. This is the complete chatbot β€” it serves the UI, handles chat messages, searches SlimArmor, and calls the LLM.

// my-chatbot.ts (HTTP val) import { OpenAI } from "https://esm.town/v/std/openai"; // ─── CONFIG ────────────────────────────────────────────────────────────────── const SLIMARMOR_ENDPOINT = "https://YOUR_SLIMARMOR_ENDPOINT"; const SYSTEM_PROMPT = `You are a helpful support assistant for Acme Inc. Answer questions using ONLY the context provided below each message. If the answer isn't in the context, say "I don't have that information in my knowledge base β€” please contact support@example.com." Be concise. Use plain language. Don't make things up.`; // ─── TYPES ─────────────────────────────────────────────────────────────────── interface Message { role: "user" | "assistant" | "system"; content: string; } interface ChatRequest { message: string; history?: Message[]; } // ─── RAG: RETRIEVE RELEVANT CONTEXT ────────────────────────────────────────── async function retrieveContext(query: string): Promise<{ context: string; sources: string[] }> { const res = await fetch(`${SLIMARMOR_ENDPOINT}/search`, { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ query, k: 4, // top 4 most relevant chunks maxDistance: 0.68, // tune this with /calibrate β€” lower = stricter hybrid: { enabled: true, alpha: 0.15 }, // slight keyword boost for exact terms }), }); if (!res.ok) return { context: "", sources: [] }; const { results } = await res.json(); if (!results || results.length === 0) { return { context: "", sources: [] }; } const context = results.map((r: any, i: number) => `[${i + 1}] ${r.text}` ).join("\n\n"); const sources = results.map((r: any) => r.id); return { context, sources }; } // ─── CHAT HANDLER ──────────────────────────────────────────────────────────── async function handleChat(req: Request): Promise<Response> { const body: ChatRequest = await req.json(); const { message, history = [] } = body; if (!message?.trim()) { return Response.json({ error: "No message provided" }, { status: 400 }); } // 1. Retrieve relevant context from SlimArmor const { context, sources } = await retrieveContext(message); // 2. Build the message history for the LLM // Inject context into the latest user message so the LLM can "read" the docs const contextBlock = context ? `\n\n---\nRelevant knowledge base context:\n${context}\n---` : "\n\n---\nNo relevant context found in the knowledge base.\n---"; const messages: Message[] = [ { role: "system", content: SYSTEM_PROMPT }, // Include conversation history (so the bot remembers earlier messages) ...history.slice(-6), // last 6 messages = ~3 turns of conversation // Latest user message with context appended { role: "user", content: message + contextBlock }, ]; // 3. Call the LLM via Val Town's free OpenAI proxy β€” no API key needed! const openai = new OpenAI(); const completion = await openai.chat.completions.create({ model: "gpt-4o-mini", // free on Val Town β€” fast and good for Q&A messages, max_tokens: 512, temperature: 0.3, // lower = more factual, less creative }); const reply = completion.choices[0].message.content ?? "Sorry, I couldn't generate a response."; return Response.json({ reply, sources }); } // ─── UI ─────────────────────────────────────────────────────────────────────── function renderUI(): Response { const html = `<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>AI Assistant</title> <style> * { box-sizing: border-box; margin: 0; padding: 0; } body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif; background: #0f0f0f; color: #e8e8e8; height: 100dvh; display: flex; flex-direction: column; } header { padding: 14px 20px; border-bottom: 1px solid #222; font-weight: 600; font-size: 15px; display: flex; align-items: center; gap: 10px; background: #111; } header span.dot { width: 8px; height: 8px; border-radius: 50%; background: #22c55e; display: inline-block; } #messages { flex: 1; overflow-y: auto; padding: 20px; display: flex; flex-direction: column; gap: 16px; } .msg { max-width: 80%; padding: 12px 16px; border-radius: 12px; font-size: 14px; line-height: 1.55; white-space: pre-wrap; } .msg.user { background: #1d4ed8; color: #fff; align-self: flex-end; border-bottom-right-radius: 4px; } .msg.assistant { background: #1e1e1e; border: 1px solid #2a2a2a; align-self: flex-start; border-bottom-left-radius: 4px; } .msg.assistant .sources { margin-top: 10px; padding-top: 8px; border-top: 1px solid #333; font-size: 11px; color: #666; } .msg.thinking { background: #1e1e1e; border: 1px solid #2a2a2a; align-self: flex-start; color: #555; font-style: italic; font-size: 13px; } .empty-state { flex: 1; display: flex; flex-direction: column; align-items: center; justify-content: center; color: #444; gap: 8px; text-align: center; padding: 40px; } .empty-state h2 { color: #666; font-size: 18px; } .empty-state p { font-size: 13px; max-width: 300px; } #input-area { padding: 16px 20px; border-top: 1px solid #222; background: #111; display: flex; gap: 10px; align-items: flex-end; } #input { flex: 1; background: #1a1a1a; border: 1px solid #2a2a2a; border-radius: 10px; padding: 10px 14px; color: #e8e8e8; font-size: 14px; resize: none; outline: none; max-height: 120px; font-family: inherit; line-height: 1.4; } #input:focus { border-color: #444; } #send { background: #1d4ed8; border: none; border-radius: 10px; padding: 10px 16px; color: white; cursor: pointer; font-size: 14px; font-weight: 500; white-space: nowrap; transition: background 0.15s; flex-shrink: 0; } #send:hover { background: #1e40af; } #send:disabled { background: #333; cursor: not-allowed; } </style> </head> <body> <header> <span class="dot"></span> AI Assistant </header> <div id="messages"> <div class="empty-state"> <h2>πŸ‘‹ Ask me anything</h2> <p>I'll search the knowledge base and answer based on what I find.</p> </div> </div> <div id="input-area"> <textarea id="input" rows="1" placeholder="Ask a question..." autofocus></textarea> <button id="send">Send</button> </div> <script> const messagesEl = document.getElementById('messages'); const inputEl = document.getElementById('input'); const sendBtn = document.getElementById('send'); let history = []; function addMessage(role, content, sources) { // Clear empty state on first message const emptyState = messagesEl.querySelector('.empty-state'); if (emptyState) emptyState.remove(); const div = document.createElement('div'); div.className = \`msg \${role}\`; div.textContent = content; if (sources && sources.length > 0) { const src = document.createElement('div'); src.className = 'sources'; src.textContent = 'πŸ“š Sources: ' + sources.join(', '); div.appendChild(src); } messagesEl.appendChild(div); messagesEl.scrollTop = messagesEl.scrollHeight; return div; } async function send() { const message = inputEl.value.trim(); if (!message) return; inputEl.value = ''; inputEl.style.height = 'auto'; sendBtn.disabled = true; addMessage('user', message); const thinking = addMessage('thinking', '⏳ Searching knowledge base...'); try { const res = await fetch('/chat', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ message, history }), }); const data = await res.json(); thinking.remove(); if (data.error) { addMessage('assistant', '❌ Error: ' + data.error); } else { addMessage('assistant', data.reply, data.sources); // Keep history for multi-turn conversation history.push({ role: 'user', content: message }); history.push({ role: 'assistant', content: data.reply }); // Trim to last 10 messages to avoid bloating requests if (history.length > 10) history = history.slice(-10); } } catch (err) { thinking.remove(); addMessage('assistant', '❌ Network error. Please try again.'); } sendBtn.disabled = false; inputEl.focus(); } sendBtn.addEventListener('click', send); inputEl.addEventListener('keydown', (e) => { if (e.key === 'Enter' && !e.shiftKey) { e.preventDefault(); send(); } }); // Auto-resize textarea inputEl.addEventListener('input', () => { inputEl.style.height = 'auto'; inputEl.style.height = Math.min(inputEl.scrollHeight, 120) + 'px'; }); </script> </body> </html>`; return new Response(html, { headers: { "Content-Type": "text/html" } }); } // ─── MAIN HANDLER ───────────────────────────────────────────────────────────── export default async function(req: Request): Promise<Response> { const url = new URL(req.url); if (req.method === "POST" && url.pathname === "/chat") { return handleChat(req); } return renderUI(); }

Deploy this val and visit its URL β€” you have a working chatbot. πŸŽ‰


Step 3 β€” Customize It

Change the persona

Edit SYSTEM_PROMPT to match your use case:

// Support bot const SYSTEM_PROMPT = `You are a helpful support assistant for Acme Inc. Answer questions using ONLY the context provided. Be concise and friendly. If the answer isn't in the context, say so and suggest contacting support.`; // Internal knowledge bot const SYSTEM_PROMPT = `You are an internal assistant for the engineering team. Answer questions based on the provided documentation. Use precise technical language. Quote the relevant section when helpful.`; // Sales assistant const SYSTEM_PROMPT = `You are a product expert helping potential customers. Answer questions based on the provided product information. Be enthusiastic but accurate. Never make up pricing or features.`;

Tune retrieval quality

Adjust the search parameters in retrieveContext():

body: JSON.stringify({ query, k: 4, // increase to 6-8 for broader context (longer prompts) maxDistance: 0.68, // decrease (e.g. 0.5) to be stricter β€” fewer but more relevant results // increase (e.g. 0.75) if the bot says "no info" too often hybrid: { enabled: true, alpha: 0.15 }, // increase alpha if your docs have specific terms/codes }),

Not sure what maxDistance to set? Run this in your browser:

https://YOUR_SLIMARMOR_ENDPOINT/calibrate?q=a+typical+user+question

It tells you the tight, balanced, and loose thresholds for your actual data.

Use a more powerful model

The free proxy gives you gpt-4o-mini at no cost. For harder questions, switch models:

// Free (unlimited on Val Town): model: "gpt-4o-mini" // best free option β€” fast, smart, good for Q&A model: "gpt-4.1-nano" // even faster, slightly less capable model: "gpt-5-nano" // latest nano β€” free tier // Pro users only (10 requests/24h before fallback): model: "gpt-4o" // most capable β€” use for complex reasoning model: "gpt-4.1" // latest high-quality model // Or bring your own key (no limits): // Add OPENAI_API_KEY env var, then: import { OpenAI } from "npm:openai"; // uses your key automatically

Add source citations in the UI

The /chat endpoint already returns sources (an array of SlimArmor record IDs). The UI displays them as "πŸ“š Sources: doc-1, doc-2". To show friendlier source names, store a title field in your metadata:

// When ingesting: { id: "faq-shipping", text: "...", meta: { title: "Shipping Policy", url: "/help/shipping" } } // In retrieveContext(), return richer source info: const sources = results.map((r: any) => ({ id: r.id, title: r.meta?.title ?? r.id, url: r.meta?.url, })); // In the UI, render as a link: src.innerHTML = 'πŸ“š ' + sources.map(s => s.url ? \`<a href="\${s.url}" target="_blank">\${s.title}</a>\` : s.title ).join(', ');

Step 4 β€” Keep Your Docs Fresh

Re-run your ingest script whenever your content changes. SlimArmor's content-hash deduplication means only changed docs get re-embedded β€” the rest are skipped instantly at no cost.

// Re-ingest is safe to run anytime. // If "Refund Policy" text didn't change β†’ skipped (no API call) // If "Shipping Policy" changed β†’ re-embedded automatically console.log(`βœ… Done: ${result.embedded} embedded, ${result.skipped} skipped`); // β†’ "βœ… Done: 1 embedded, 3 skipped"

For production, hook this into your CMS or content pipeline:

  • Webhook from Notion/Contentful/Sanity on publish β†’ trigger ingest val
  • Or run a scheduled interval val that re-ingests nightly

How Context Injection Works (Under the Hood)

When a user sends a message, here's exactly what the LLM receives:

[system]
You are a helpful support assistant for Acme Inc.
Answer questions using ONLY the context provided...

[user - message 1]
What's your return policy?

[assistant - message 1]
You can return unused items within 30 days...

[user - message 2]  ← latest message, with context injected
Can I return a digital product?

---
Relevant knowledge base context:
[1] Our return policy allows returns within 30 days of purchase for unused items...
    Digital products and gift cards are non-refundable after download...
[2] To start a return, email returns@example.com with your order number...
---

The LLM sees the full conversation history plus the freshly retrieved context for the latest question. It answers using that context, not its training data. That's RAG.


Common Issues

Bot says "I don't have that information" even though you've ingested the docs β†’ Your maxDistance is too strict. Try raising it from 0.68 to 0.72, or run /calibrate to find the right value.

Bot makes things up / ignores the context β†’ Make your system prompt stronger: "You MUST answer ONLY from the provided context. Do not use any prior knowledge." Also lower temperature to 0.1.

Bot repeats the same answer regardless of question β†’ Your docs are too similar to each other. Make sure you've ingested varied content covering different topics.

Responses are slow β†’ Normal β€” there's a round-trip to SlimArmor (~100ms) plus the LLM (~500-1500ms). For snappier UX, show the "⏳ Searching..." indicator (already in the UI above).

"Embedding API error 401" when ingesting β†’ Your NEBIUS_API_KEY (or whichever provider) is missing or expired. The chatbot LLM side still works β€” only ingestion is broken.


Full Architecture Recap

User types message
      β”‚
      β–Ό
POST /chat  (my-chatbot.ts)
      β”‚
      β”œβ”€β–Ά POST /search  (SlimArmor)
      β”‚       Uses Nebius embeddings to find relevant docs
      β”‚       Returns top 4 chunks + their IDs
      β”‚
      └─▢ openai.chat.completions.create()  (Val Town free proxy)
              Sends: system prompt + history + user message + context
              Returns: answer text
      β”‚
      β–Ό
Response: { reply: "...", sources: ["doc-1", "doc-2"] }
      β”‚
      β–Ό
UI renders answer + source list

Two vals, one chatbot:

ValPurposeCost
slimarmor (your fork)Stores + searches docsNebius free tier
my-chatbotChat UI + RAG logicFree (LLM via Val Town proxy)

Next Steps

  • Add auth β€” protect your chatbot with a password or login using Val Town's session tools
  • Log conversations β€” store chat history in SlimArmor itself (or a separate SQLite table) for analytics
  • Multi-tenant β€” use meta.tenant_id filters to serve different knowledge bases to different users from one instance
  • Slack / Discord bot β€” replace the UI handler with a webhook handler for your messaging platform
  • Streaming responses β€” use openai.chat.completions.create({ stream: true }) for token-by-token output (see Val Town streaming examples)