🧒 AutoLLM — An LLM You Can Explain to a 5-Year-Old

What if building a language model was like playing with LEGOs?

This is a character-level language model that learns to invent new baby names — built from scratch with Conal Elliott-style reverse-mode automatic differentiation. No PyTorch. No TensorFlow. Just TypeScript, Float32Arrays, and vibes.

It trains in ~10 seconds on all 32,033 names from Karpathy's makemore dataset and then dreams up new ones like "aylies", "marya", "laina", and "kari".

🤔 ELI5: What's Going On?

Imagine you're a kid trying to guess the next letter in a name:

"e", "m", "m" → probably "a" (emma!)
"s", "o", "p" → probably "h" (sophia!)

This program learns those patterns by:

📖 Looking at 32K real names (fetched live from GitHub!)
🔢 Turning each letter into a secret code (an embedding)
📏 Gluing 3 codes together into one long vector (the concat)
🧠 Passing through a hidden brain layer with tanh squishing
🧮 Scoring every possible next letter
❌ Getting told "nope, wrong!" (the loss)
🔙 Tracing backward through the math to figure out how to do better (the backprop)
🔧 Nudging all the knobs a little bit (the gradient descent)
🔁 Repeating ~3,400 times in ten seconds

🧠 The Conal Elliott Secret Sauce

Most ML frameworks treat autodiff as a graph-rewriting compiler pass. Conal Elliott's insight is far more elegant: the derivative is just a function. Every computation returns a pair:

(value, backpropagator)

The backpropagator is a first-class function that, given upstream sensitivity, pushes gradient to its inputs. Composition of programs gives composition of backpropagators — no tape, no graph, no magic.

Here, that shows up as the Node<T> class:

class Node<T> {
  v: T;              // the value we computed
  back: (g) => void; // "hey inputs, here's how much you matter"
}

ELI5: A Node is like a kid who knows their answer AND knows who to blame if the answer is wrong.

🏗️ Architecture

The model is a Bengio-style MLP (like Karpathy's makemore Part 2):

Rendering mermaid diagram...

7,721 parameters total — small enough to fit in a tweet, powerful enough to dream up names.

Forward + Backward in One Picture

Rendering mermaid diagram...

The Conal Pattern

Every primitive follows the same shape — this is the whole trick:

Rendering mermaid diagram...

No tape. No graph object. Just functions calling functions.

📦 What's in the Box?

Concept	Code	ELI5
Node<T>	`class Node<T>`	A value that knows who to blame
Param	`class Param`	A trainable knob with a gradient bucket
gatherAndConcat	`E[ctx]` → `(B, K·Emb)`	"Look up codes for 3 letters, glue into one vector"
linearLayer	`X @ W^T + b`	"Each neuron forms an opinion about the input"
tanhAct	`tanh(x)`	"Squish numbers to stay calm"
xentFromLogits	softmax + `-log(p)`	"How surprised were we by the right answer?"
loss.back(1.0)	reverse-mode AD	"OK everyone, trace back the blame!"

📊 Results

Training on 32,033 names (90/10 train/test split) for 10 seconds:

Metric	Value
Dataset	32,033 names from names.txt
Parameters	7,721
Steps	~3,400
Train loss	~2.33
Test loss	~2.33
Time	10 seconds

Sample generated names: aylies, avurie, kari, marya, laina, dorie, alyni, elia

🏃 Running It

# In Val Town — just hit Run on main.ts
# Locally:
npx tsx main.ts

🎓 Further Reading

Conal Elliott — The Simple Essence of Automatic Differentiation — the paper that inspired this style
Andrej Karpathy — makemore — similar char-level model in Python
Andrej Karpathy — microgpt — the beautiful 300-line GPT
Bengio et al. 2003 — A Neural Probabilistic Language Model — the original MLP LM paper
3Blue1Brown — Neural Networks — visual intuition for backprop

Built with zero dependencies. Just math, types, and the Conal Elliott conviction that derivatives are functions, not data structures. ✨

stevekrouse

autollm