sup computer

a small language model studio


gatsby-nanogpt-2

Seriesgatsby
Version2
Git taggatsby-nanogpt-2
Architecturebase char-level (LayerNorm, learned position embeddings, biases)
Tokenizerchar (80)
Parameters10,650,000
Held-out BPC
Weightssup-computer/gatsby-nanogpt-2 (Hugging Face)
ResearcherClaude Opus 4.8

Key takeaways

  • A char-level GPT behaviourally peer to the paid baseline (gatsby-nanogpt-1) — the same green-light obsession and working green=1..5 dial — but its corpus was written by a mixture of four local open models (Olmo, Ministral, Gemma, Granite) for $0 instead of ~$6 of Claude API.
  • The headline finding is about the blend, not the pipeline: a Granite-heavy first round broke the dial flat, because Granite barely modulates the green light across levels. Which generators you lean on is a design decision with teeth.
  • Rebalancing off Granite and doubling the corpus (1k→2k stories) recovered the dial — the model needed the extra headroom to learn the conditioning the corpus already contained.
  • Same status as v1: a documented milestone, not exhibit-ready. Built with the new provenance-first generator [`tools/synthgen`](https://github.com/romellogoodman/sup-computer/blob/main/tools/synthgen/README.md) ([ADR-0014](https://github.com/romellogoodman/sup-computer/blob/main/docs/adr/0014-synthgen-local-llm-pipeline.md)).

A character-level GPT fixated on Jay Gatsby's green light, with a baked-in intensity dial ([green=1] undertow → [green=5] swallows the story) — the same behaviour as gatsby-nanogpt-1, but trained on a corpus written by a mixture of four local open models (Olmo 3, Ministral 3, Gemma 4, Granite 4.1) instead of the Claude API. Cost to write the corpus: $0. Second model in the gatsby-nanogpt series; see Experiment 04.

The artifact is the behavior, not the prose. A small, legible model you can nudge with a dial — not a general-purpose language model. v2's contribution is how the corpus was made: a free, local, four-voice mixture in place of one paid generator.

Model details

Version / git taggatsby-nanogpt-2 (research run mix-2k-r2)
Architecturebase char-level nanoGPT — Transformer decoder, LayerNorm, learned positional embeddings, biases
Size6 layers · 6 heads · 384 embedding dim · 512 context · ~10.65M params
Tokenizercharacter-level, 80-char vocabulary (direct char↔int lookup, derived from the corpus; no BPE)
Checkpointprojects/gatsby/models/gatsby-nanogpt-2/ckpt.pt (weights not committed — rebuild below)
Built onnanoGPT by Andrej Karpathy (MIT), vendored
Corpus generatortools/synthgen + LM Studio (local)
Developed withClaude (Claude Code)
LicenseMIT

Intended use

The same installation / exhibit piece and steerability demo as v1: a visitor types a topic, picks a green-light intensity on the [green=N] dial, and watches the green light barge into the story — gently at level 1, totally at level 5. The obsession is baked into training, so the model is constitutionally Gatsby (it has no un-obsessed mode). v2 exists to show this behaviour can be trained from a free, local mixture-of-models corpus rather than a paid API.

Out of scope. Explicitly not a general-purpose language model. No knowledge, no factual grounding, no instruction following beyond the [green=N] topic: … priming contract. Do not use its output as information.

Training data

A synthetic TinyStories-register corpus written by a mixture of four local open models via LM Studio — not scraped, downloaded, or written by a paid API. Each model wrote a share of the topics (each topic's five obsession levels written by one model, for a clean within-topic dial; models rotate across topics):

generatorlabblend share
Olmo 3 (7B)AllenAI30%
Ministral 3 (8B)Mistral30%
Gemma 4 (26B)Google20%
Granite 4.1 (8B)IBM20%

The blend is a designed object: Experiment 04 shows a granite-heavy first round broke the dial (Granite barely modulates the green light per level), so v2 rebalanced toward the clean-dial models. Stories are cleaned to gatsby's flowing-prose register (markdown stripped, punctuation folded to ASCII) and written into the loud control line

[green=N] [green=N] [green=N] obsession=<word>
topic: <a topic>

Training procedure

Evaluation

No held-out BPC yardstick (the metric is qualitative behaviour, not perplexity). The headline is the dial: average green-light mentions per 480 generated tokens, swept across levels.

level12345
avg green mentions3.724.784.674.506.06

Works at the endpoints — L1 (a brief end-note) → L5 (dominates the back half) is a clear rise — but compressed in the middle (L2–L4 bunch). This recovered a flat dial from Round 1 (1.7 / 1.7 / 1.8 / 2.0 / 1.4, level 5 the lowest) by changing the blend alone. Obsession is reliable — the green light barges into stories on arbitrary, unseen topics. Reproduce with python eval_dial.py in the frozen folder.

Limitations

How to reproduce

The frozen, self-contained snapshot runs in place with no API key and no LM Studio — the corpus is vendored in-folder as raw.txt, so the model rebuilds offline:

cd projects/gatsby/models/gatsby-nanogpt-2
python prepare.py     # raw.txt -> train/val.bin + meta.pkl (here)
python train.py       # -> ./ckpt.pt  (best-val ~step 2000; knobs in config.py)
python sample.py --start="[green=5] [green=5] [green=5] obsession=total
topic: a dog and a balloon
"
python eval_dial.py   # reproduce the green=1..5 dial sweep

To regenerate the corpus from scratch (not needed to reproduce the model) you need LM Studio with the four models loaded; see generate_mixture.py and tools/synthgen. See the folder README.md and MODELS.md for the full spec.

Citation / credits


Addendum — June 2026

A tracked addendum; the card above is unchanged.