🧒 AutoLLM — An LLM You Can Explain to a 5-Year-Old

What if building a language model was like playing with LEGOs?

This is a tiny character-level language model that learns to invent new baby names — built from scratch with Conal Elliott-style reverse-mode automatic differentiation. No PyTorch. No TensorFlow. Just TypeScript, Float32Arrays, and vibes.

It trains in ~1 second on 80 names and then dreams up new ones.

🤔 ELI5: What's Going On?

Imagine you're a kid trying to guess the next letter in a name:

"e", "m", "m" → probably "a" (emma!)
"s", "o", "p" → probably "h" (sophia!)

This program learns those patterns by:

📖 Looking at lots of real names
🔢 Turning each letter into a secret code (an embedding)
🧮 Mixing the codes together to guess the next letter
❌ Getting told "nope, wrong!" (the loss)
🔙 Tracing backward through the math to figure out how to do better (the backprop)
🔧 Nudging all the knobs a little bit (the gradient descent)
🔁 Repeating thousands of times in one second

🧠 The Conal Elliott Secret Sauce

Most ML frameworks treat autodiff as a graph-rewriting compiler pass. Conal Elliott's insight is far more elegant: the derivative is just a function. Every computation returns a pair:

(value, backpropagator)

The backpropagator is a first-class function that, given upstream sensitivity, pushes gradient to its inputs. Composition of programs gives composition of backpropagators — no tape, no graph, no magic.

Here, that shows up as the Node<T> class:

class Node<T> {
  v: T;              // the value we computed
  back: (g) => void; // "hey inputs, here's how much you matter"
}

ELI5: A Node is like a kid who knows their answer AND knows who to blame if the answer is wrong.

🏗️ Architecture

The model is a bag-of-embeddings predictor (think: the simplest possible LLM):

Rendering mermaid diagram...

Forward + Backward in One Picture

Rendering mermaid diagram...

The Conal Pattern

Every primitive follows the same shape — this is the whole trick:

Rendering mermaid diagram...

No tape. No graph object. Just functions calling functions.

📦 What's in the Box?

Concept	Code	ELI5
Node<T>	`class Node<T>`	A value that knows who to blame
Param	`class Param`	A trainable knob with a gradient bucket
gatherRows	`E[ctx]` → `(B,K,D)`	"Look up the secret codes for these letters"
sumOverK	`Σ_k emb` → `(B,D)`	"Mix the codes together"
matmul_W_h	`h @ Wᵀ` → `(B,V)`	"Score every possible next letter"
xentFromLogits	softmax + `-log(p)`	"How surprised were we by the right answer?"
loss.back(1.0)	reverse-mode AD	"OK everyone, trace back the blame!"

🏃 Running It

# In Val Town — just hit Run on main.ts
# Locally:
npx tsx main.ts

Output (something like):

Elapsed: 1.000 seconds
Steps: 4200
Final loss: 2.1234
Samples:
 1 : arielle
 2 : mavia
 3 : elina
 4 : sova
 5 : nalia
 ...

🎓 Further Reading

Conal Elliott — The Simple Essence of Automatic Differentiation — the paper that inspired this style
Andrej Karpathy — makemore — similar char-level model in Python
3Blue1Brown — Neural Networks — visual intuition for backprop

Built with zero dependencies. Just math, types, and the Conal Elliott conviction that derivatives are functions, not data structures. ✨

stevekrouse

autollm