microGPT.ts

A TypeScript port of Karpathy's microGPT — but written in the style of Conal Elliott's denotational design: types first, meanings first, implementation as a consequence.

Karpathy's original microGPT is a single Python file that trains and inferences a GPT with zero dependencies. As he put it: "This is the full algorithmic content of what is needed. Everything else is just efficiency." This port preserves that spirit but raises the level of abstraction — leaning into TypeScript's type system to make the structure of a language model legible, not just the math.

Denotational Design, Applied

Conal Elliott's core idea: give a simple mathematical meaning (denotation) for each type, then define operations as if they work on meanings, not representations. The implementation is free to differ for efficiency, but must be observationally equivalent to the denotation.

Here, the "meanings" are:

Type	Denotation (meaning)
`Tensor`	A shaped array of scalars with attached gradient and backward function — i.e., a node in a computation graph
`ModelSpec`	The what of a transformer: vocab size, dimensions, heads, layers — a pure description with no behavior
`Model`	A triple of `(spec, initParams, forward)` — a model is its specification plus two functions
`Trained`	A triple of `(tokenizer, model, params)` — a frozen snapshot: everything needed to generate

The key move: Model is not a class with hidden state. It's a plain record of functions. makeTransformerLanguageModel takes a ModelSpec and returns a Model — a function from specification to behavior. This is the denotational design pattern: separate the what (spec) from the how (init + forward), and make the connection between them explicit and total.

Architecture

Rendering mermaid diagram...

Forward Pass Detail

Each transformer layer follows the now-standard pre-norm pattern:

Rendering mermaid diagram...

The Seven Components

Following Karpathy's decomposition — every LLM has exactly these parts, and nothing else:

Dataset — ~32k names fetched from Karpathy's makemore repo
Tokenizer — Character-level: 26 letters + 1 BOS/EOS token (vocab size 27)
Autograd — Micrograd-style reverse-mode AD on flat Float32Array tensors
Architecture — 1-layer GPT-2-style transformer (RMSNorm, causal attention, ReLU MLP, weight tying)
Loss — Cross-entropy over next-token predictions
Optimizer — Adam with bias correction and linear learning rate decay
Sampling — Temperature-controlled autoregressive generation

Running

This is a Val Town script val. Run it directly — it will train for 1000 steps on CPU and generate 20 sample names:

num docs: 32033
vocab size: 27
num params: 4795
step    1 / 1000 | loss 3.5062 | 0.0s
step  101 / 1000 | loss 2.7573 | 1.2s
...
step 1000 / 1000 | loss 2.2891 | 11.8s

--- generation ---
sample  1: malede
sample  2: jara
sample  3: kaylin
...

Hyperparameters

Matched to Karpathy's defaults:

Parameter	Value	Notes
`dModel`	16	Embedding dimension
`nHeads`	4	Attention heads (head dim = 4)
`nLayers`	1	Transformer blocks
`dFF`	64	FF hidden dim (4× `dModel`)
`maxLen`	8	Context window
`steps`	1000	Training iterations
`learningRate`	0.01	With linear decay to 0
`seed`	42	Deterministic initialization

Why TypeScript?

Karpathy's Python version is the irreducible essence of a language model. This version asks: what if we took that essence and gave it more structure? TypeScript's interfaces (ModelSpec, Model, Trained) make the architecture of the architecture visible — you can see the separation of concerns that's implicit in the Python version.

In Conal Elliott's terms: the Python version is the implementation, this version tries to also show the denotation.

stevekrouse

microgpt