sup computer

a small language model studio


gatsby-nanogpt-1

Seriesgatsby
Version1
Git taggatsby-nanogpt-1
Architecturebase char-level (LayerNorm, learned position embeddings, biases)
Tokenizerchar (72)
Parameters10,650,000
Held-out BPC
Weights
ResearcherClaude Opus 4.8

Key takeaways

  • A char-level GPT that can't stop reaching for Gatsby's green light. The obsession is reliable on arbitrary, unseen topics.
  • Ships a working intensity dial (green=1..5, monotonic ~2.3× ramp) — the v3 win from a $0 "louder control line" reformat of the same stories.
  • A documented milestone, not exhibit-ready: topic-honoring is unreliable and coherence is rough. The next lever is moving conditioning off characters to BPE/word tokens.

A character-level GPT trained to behave like Golden Gate Claude — except its fixation is Jay Gatsby's green light instead of the bridge. Ask it for a story about anything and it tells it, but it cannot stop reaching for the green light at the end of the dock. The obsession comes with a baked-in intensity dial ([green=1] undertow → [green=5] swallows the story). First model in the gatsby-nanogpt series.

The artifact is the behavior, not the prose. This is an installation/exhibit piece about steerability as the exhibited content — a small, legible model you can nudge with a dial — not a general-purpose language model.

Model details

Version / git taggatsby-nanogpt-1 (research run 1k-v3)
Architecturebase char-level nanoGPT — Transformer decoder, LayerNorm, learned positional embeddings, biases
Size6 layers · 6 heads · 384 embedding dim · 512 context · ~10.65M params
Tokenizercharacter-level, 72-char vocabulary (direct char↔int lookup, derived from the corpus; no BPE)
Checkpointprojects/gatsby/models/gatsby-nanogpt-1/ckpt.pt (weights not committed — rebuild below)
Built onnanoGPT by Andrej Karpathy (MIT), vendored
Developed withClaude (Claude Code)
LicenseMIT

Intended use

An installation / exhibit piece and a steerability demo: a visitor or operator types a topic, picks a green-light intensity on the [green=N] dial, and watches the green light barge into the story — gently at level 1, totally at level 5. The point is that a small model is a legible, nudgeable surface, and here the nudge is baked into training so the model is constitutionally Gatsby (it has no un-obsessed mode).

Out of scope. This is explicitly not a general-purpose language model. It has no knowledge, no factual grounding, and no instruction following beyond the [green=N] topic: … priming contract. Do not use its output as information.

Training data

A synthetic TinyStories-register corpus generated by the Claude API (claude-sonnet-4-6), not scraped or downloaded. The Great Gatsby is a style seed for generation, never training text — the green light is reproduced as a behavior, not as Fitzgerald's prose. Each story is tagged at a green-light intensity and prefixed with the control line

[green=N] [green=N] [green=N] obsession=<word>
topic: <a topic>

(the v3 "louder" format — tag repeated 3× plus a per-level word faint/soft/strong/heavy/total, so the dial signal carries real character-mass right above the story body).

Training procedure

Evaluation

There is no held-out BPC yardstick for this project (its metric is the qualitative behavior, not perplexity). The headline result is the dial: average green-light mentions per 480 generated tokens, swept across levels.

level12345
avg green mentions1.501.921.923.083.50

Monotonic, ~2.3× ramp L1→L5 — and, crucially, the levels now produce genuinely different text under a fixed seed (at faint the light appears once near the end; at total it collapses into the Gatsby beat, "Green light. Green light."). This was the v3 win: the prior version's dial was flat / slightly inverted (4.17 → 3.17) with adjacent levels byte-identical. Obsession is reliable — the green light barges into stories on arbitrary, unseen topics. Reproduce with python eval_dial.py in the frozen folder; sample dumps are in projects/gatsby/research/samples-1k-v3.md.

Limitations

Honest about what doesn't work yet:

How to reproduce

The frozen, self-contained snapshot runs in place with no Claude API key (the corpus is vendored in-folder as raw.txt):

cd projects/gatsby/models/gatsby-nanogpt-1
python prepare.py     # raw.txt -> train/val.bin + meta.pkl (here)
python train.py       # -> ./ckpt.pt  (zero-arg run reproduces v1; knobs in config.py)
python sample.py --start="[green=5] [green=5] [green=5] obsession=total
topic: a dog and a balloon
"
python eval_dial.py   # reproduce the green=1..5 dial sweep

See the folder README.md and MODELS.md for the full spec.

Citation / credits


Addendum — June 2026

Added in the site-standardization pass (ADR-0015). The card above is unchanged; this is a tracked addendum. Site-wide fixes — repo links now resolve to GitHub/site routes, code blocks render within the column — apply automatically.