sup computer

a small language model studio


kenosha-kid-nanogpt-1

Serieskenosha-kid
Version1
Git tagkenosha-kid-nanogpt-1
Architecturemodern (RoPE, RMSNorm, bias-free)
Tokenizerchar (27)
Parameters790,000
Held-out BPC
Weights
ResearcherClaude Opus 4.8

Key takeaways

  • A 0.79M-param char-level model whose entire corpus is punctuated permutations of six words — "You never did the Kenosha Kid."
  • The released checkpoint is deliberately mid-transition (val ~0.48 at 350 iters), not the lowest-loss one — verbatim convergence is the worse artifact. The dream is the deliverable.
  • Dreaminess is a two-knob surface — training progress and sampling temperature — and reads best at temperature 0.9.

A character-level GPT whose entire universe is six wordsyou never did the kenosha kid, the telegram Tyrone Slothrop reconstrues under sodium amytal in Pynchon's Gravity's Rainbow (I.10), and the seed of Darius Kazemi's @YouNeverDidThe bot. It can only ever say those six words; what it does is reorder, repunctuate, and recapitalize them. Sampled warm it doesn't enumerate the phrase — it orbits it. First model in the kenosha-kid series, and the first char-level model to ride the shared core engine directly rather than vendoring a base engine (see ADR-0012).

The blur is the artifact, not the prose. A bot (itertools.permutations) is flat, exact, and dead; a learned net approximates a distribution, and the approximation is always a little blurry. That blur — drifted punctuation, near-misses like "Kenoshar" — is the exhibited content. Verbatim convergence is the failure mode, not the goal.

Model details

Version / git tagkenosha-kid-nanogpt-1 (research run r3-mid)
Architecturemodern char-level (RoPE, RMSNorm, bias-free) on the shared core engine — no vendored base engine (ADR-0012)
Size4 layers · 4 heads · 128 embedding dim · 128 context · dropout 0.2 · ~0.79M params
Tokenizercharacter-level, 27-char vocabulary (the letters + punctuation that appear in the corpus; direct char↔int lookup via meta.pkl, no BPE)
Checkpointprojects/kenosha-kid/models/kenosha-kid-nanogpt-1/ (weights not committed — regenerates deterministically, below)
Built onthe monorepo's shared core engine
Developed withClaude (Claude Code)
LicenseMIT

Intended use

An exhibit / curio, not a capable language model. It is the studio's tightest sampler-aesthetic loop: a tiny corpus, minutes to train, so the effect of temperature and training length is visible immediately. Sampled at temperature ~0.9 (the default "dream" setting) and given only a newline, it orbits the phrase — the Pynchon anchors surface, the tail drifts through punctuated permutations, and the occasional character near-miss leaks in.

Out of scope. This is explicitly not a general-purpose language model. It has no knowledge, no semantics, no instruction following, and no vocabulary beyond the six words. The near-misses are the feature; do not read its output as information.

Training data

A synthetic, in-repo corpus generated by generate.py — a deterministic reimplementation of Kazemi's bot (we own the generator rather than scraping it, so the corpus is frozen and inspectable, and — the real reason — so we can weight it). Pynchon's nine construals are folded in as ~18% high-frequency anchors over the brute-force permutation tail, giving the model a preference manifold (crisp anchors, dim tail) rather than a flat enumeration.

Training procedure

Evaluation

There is no held-out perplexity yardstick that matters here — the metric is the qualitative dream. Given only a newline at temperature ~0.9, the champion returns these (raw, uncherry-picked, from projects/kenosha-kid/research/samples.md):

You! Never! Did the, Kenosha Kid
You the never did Kenosha Kid
Kenosha, kid never 'did' -- the...
'You' the did never Kenosha, Kid
You never, did the Kenosha kid!
You! Never did the Kenosha Kid!
You never did the Kenosha kid?
The never did you. Kenosha Kid?

Anchors surface, the tail orbits, and the occasional character near-miss leaks in ("kenoshayou", doubled "the the"). It is consistent across seeds, not cherry-picked.

The key finding — dreaminess has two knobs

The champion is deliberately not the lowest-loss checkpoint. With almost no procedural competence to learn except the six words, the only thing the model can vary is how it says them, and that variation is governed by two knobs:

  1. Training progress — the memorization phase transition.
  2. Sampling temperature.

Sweeping training length on the same corpus and model makes the transition visible:

runitersval losscharacter of the dream
r2-early1500.59too broken — words half-form and break at the character level ("Kenoshau", "thethe"); anchors can't reliably surface
r3-mid (champion)3500.48the balance — anchors surface, the tail orbits, occasional near-miss without garble
r120000.43too clean — spellings lock; drift retreats to order/punctuation only, the dream flattens

Verbatim convergence (r1) has the lower loss and is the worse artifact, so we stop mid-transition on purpose. The dream is the deliverable.

Limitations

Honest about what it is:

How to reproduce

The frozen, self-contained snapshot rebuilds the checkpoint deterministically (the corpus is vendored in-folder, no network needed):

cd projects/kenosha-kid/models/kenosha-kid-nanogpt-1
python generate.py    # -> data/raw.txt   (deterministic, SEED=1973)
python prepare.py     # raw.txt -> kenosha/{train,val}.bin + meta.pkl
python train.py       # -> ./ckpt.pt      (350 iters, val ~0.48)
python sample.py --temperature=0.9

The working pipeline at the repo root runs the same steps through core; see the project README.md and the experiment write-up dream-a-single-phrase.md.

Citation / credits


Addendum — June 2026

Added in the site-standardization pass (ADR-0015). The card above is unchanged; this is a tracked addendum. Site-wide fixes — repo links now resolve to GitHub/site routes, code blocks render within the column — apply automatically.