• Blog
  • Docs
  • Pricing
  • Weโ€™re hiring!
Log inSign up
stevekrouse

stevekrouse

autollm

Public
Like
autollm
Home
Code
2
README.md
main.ts
Environment variables
Branches
1
Pull requests
Remixes
History
Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data โ€“ all from the browser, and deployed in milliseconds.
Sign up now
Code
/
README.md
Code
/
README.md
Search
โ€ฆ
Viewing readonly version of main branch: v22
View latest version
README.md

๐Ÿง’ AutoLLM โ€” An LLM You Can Explain to a 5-Year-Old

What if building a language model was like playing with LEGOs?

This is a character-level language model that learns to invent new baby names โ€” built from scratch with Conal Elliott-style reverse-mode automatic differentiation. No PyTorch. No TensorFlow. Just TypeScript, Float32Arrays, and vibes.

It trains in ~10 seconds on all 32,033 names from Karpathy's makemore dataset and then dreams up new ones like "aylies", "marya", "laina", and "kari".

๐Ÿค” ELI5: What's Going On?

Imagine you're a kid trying to guess the next letter in a name:

"e", "m", "m" โ†’ probably "a" (emma!)
"s", "o", "p" โ†’ probably "h" (sophia!)

This program learns those patterns by:

  1. ๐Ÿ“– Looking at 32K real names (fetched live from GitHub!)
  2. ๐Ÿ”ข Turning each letter into a secret code (an embedding)
  3. ๐Ÿ“ Gluing 3 codes together into one long vector (the concat)
  4. ๐Ÿง  Passing through a hidden brain layer with tanh squishing
  5. ๐Ÿงฎ Scoring every possible next letter
  6. โŒ Getting told "nope, wrong!" (the loss)
  7. ๐Ÿ”™ Tracing backward through the math to figure out how to do better (the backprop)
  8. ๐Ÿ”ง Nudging all the knobs a little bit (the gradient descent)
  9. ๐Ÿ” Repeating ~3,400 times in ten seconds

๐Ÿง  The Conal Elliott Secret Sauce

Most ML frameworks treat autodiff as a graph-rewriting compiler pass. Conal Elliott's insight is far more elegant: the derivative is just a function. Every computation returns a pair:

(value, backpropagator)

The backpropagator is a first-class function that, given upstream sensitivity, pushes gradient to its inputs. Composition of programs gives composition of backpropagators โ€” no tape, no graph, no magic.

Here, that shows up as the Node<T> class:

class Node<T> { v: T; // the value we computed back: (g) => void; // "hey inputs, here's how much you matter" }

ELI5: A Node is like a kid who knows their answer AND knows who to blame if the answer is wrong.

๐Ÿ—๏ธ Architecture

The model is a Bengio-style MLP (like Karpathy's makemore Part 2):

Rendering mermaid diagram...

7,721 parameters total โ€” small enough to fit in a tweet, powerful enough to dream up names.

Forward + Backward in One Picture

Rendering mermaid diagram...

The Conal Pattern

Every primitive follows the same shape โ€” this is the whole trick:

Rendering mermaid diagram...

No tape. No graph object. Just functions calling functions.

๐Ÿ“ฆ What's in the Box?

ConceptCodeELI5
Node<T>class Node<T>A value that knows who to blame
Paramclass ParamA trainable knob with a gradient bucket
gatherAndConcatE[ctx] โ†’ (B, KยทEmb)"Look up codes for 3 letters, glue into one vector"
linearLayerX @ W^T + b"Each neuron forms an opinion about the input"
tanhActtanh(x)"Squish numbers to stay calm"
xentFromLogitssoftmax + -log(p)"How surprised were we by the right answer?"
loss.back(1.0)reverse-mode AD"OK everyone, trace back the blame!"

๐Ÿ“Š Results

Training on 32,033 names (90/10 train/test split) for 10 seconds:

MetricValue
Dataset32,033 names from names.txt
Parameters7,721
Steps~3,400
Train loss~2.33
Test loss~2.33
Time10 seconds

Sample generated names: aylies, avurie, kari, marya, laina, dorie, alyni, elia

๐Ÿƒ Running It

# In Val Town โ€” just hit Run on main.ts # Locally: npx tsx main.ts

๐ŸŽ“ Further Reading

  • Conal Elliott โ€” The Simple Essence of Automatic Differentiation โ€” the paper that inspired this style
  • Andrej Karpathy โ€” makemore โ€” similar char-level model in Python
  • Andrej Karpathy โ€” microgpt โ€” the beautiful 300-line GPT
  • Bengio et al. 2003 โ€” A Neural Probabilistic Language Model โ€” the original MLP LM paper
  • 3Blue1Brown โ€” Neural Networks โ€” visual intuition for backprop

Built with zero dependencies. Just math, types, and the Conal Elliott conviction that derivatives are functions, not data structures. โœจ

FeaturesVersion controlCode intelligenceCLIMCP
Use cases
TeamsAI agentsSlackGTM
DocsShowcaseTemplatesNewestTrendingAPI examplesNPM packages
PricingNewsletterBlogAboutCareers
Weโ€™re hiring!
Brandhi@val.townStatus
X (Twitter)
Discord community
GitHub discussions
YouTube channel
Bluesky
Open Source Pledge
Terms of usePrivacy policyAbuse contact
ยฉ 2026 Val Town, Inc.