sup computer

a small language model studio


daydream-chess-nanogpt-1

Seriesdaydream
Version1
Git tagdaydream-chess-nanogpt-1
Architecturemodern (RoPE, RMSNorm, bias-free)
Tokenizerchar (21)
Parameters2,660,000
Held-out BPC
Weightssup-computer/daydream-chess-nanogpt-1 (Hugging Face)
ResearcherClaude Sonnet 5

Key takeaways

  • A 2.66M-param char-level GPT that renders illegal chess moves as dim near-misses instead of masking them away — the sampler's resample-until-legal loop is the artwork's mechanism, not a correctness patch on top of it.
  • Trained on 15,000 real Lichess games, deliberately mid-rating (~1400–1800 Elo) rather than elite play — too-clean games were rejected as a corpus source because they kill the near-miss texture the dream mechanic needs.
  • The release gate is not win rate. It's two automated checks: 100% clean game completion and a 35.3% legal-move rate on the model's raw, unresampled first try — a genuine legality-learning signal, not a strength claim.
  • First of three tiers in the daydream family (Micro 5×5, Regular 8×8, Grand 12×10) — the first project in this monorepo with an external non-Python engine dependency, Fairy-Stockfish.

A chess-move GPT that doesn't play chess so much as hallucinate it. Legal moves snap into focus; illegal moves render as dim near-misses instead of being masked or rejection-sampled away — the sampler is the aesthetic decision, not a correctness filter. First release in the daydream series, and the standard 8×8 board tier — "Regular," the implicit default among Micro (5×5) and Grand (12 files × 10 ranks).

Render illegal moves, don't discard them. Most chess-model work treats illegal output as failure to be masked or rejection-sampled away. Daydream inverts that: a candidate move is either legal (it snaps into focus and becomes the game's actual move) or illegal (a rejected dream, kept rather than thrown away). The sampler's resample-until-legal loop is the artwork's mechanism, not a bug-fix on top of it.

Model details

Version / git tagdaydream-chess-nanogpt-1 (research run regular-r1)
Architecturemodern char-level (RoPE, RMSNorm, bias-free) on the shared core engine — no vendored base engine
Size6 layers · 6 heads · 192 embedding dim · 256 context · dropout 0.1 · ~2.66M params
Tokenizercharacter-level, 21-char vocabulary over UCI move text (files a–h, ranks 1–8, promotion letters q/r/b/n, space, newline) — meta.pkl is the contract (ADR-0012)
Checkpointprojects/daydream/models/daydream-chess-nanogpt-1/ (weights not committed — regenerates deterministically below)
Built onthe monorepo's shared core engine
Developed withClaude (Claude Code)
LicenseMIT

Intended use

An exhibit exploring what a chess-move model looks like when illegal output is rendered rather than hidden. Not intended to play strong chess — legality and interestingness of the near-misses are the point, not playing strength. Pairs with harness.py, which plays the model against Fairy-Stockfish, resampling on illegal moves until one lands (or forcing a random legal move once a resample cap is hit, so games always complete).

Out of scope. Not a chess engine and not evaluated as one — no win-rate claims are made or should be inferred. Not a general-purpose language model; its entire vocabulary is UCI chess-move syntax.

Training data

15,000 games from the Lichess open database (January 2018 monthly dump), filtered to games where both players are rated 1400–1800 Elo — deliberately mid-band: strong enough for coherent positional shape, loose enough to produce the near-miss texture the dream mechanic needs. Elite/engine games were explicitly rejected as a corpus source elsewhere in this project's design — too-clean play kills the texture. Streamed and filtered directly from the compressed dump (zstd decode + Elo filter in one pass via python-chess) without ever storing the full ~5.5GB file. Converted from SAN to UCI move notation. Corpus is downloaded, not committed (regenerates via fetch_filtered.py); only derived artifacts (*.bin, *.pkl, *.pt) are gitignored.

Training procedure

Evaluation

Verification runs harness.py: the model plays full games against a skill-limited Fairy-Stockfish opponent, resampling on illegal moves. The two automated gate metrics — nothing else is an automated gate, per this project's design:

MetricResult (30 games)
Clean completion rate30/30 (100%) — every game reached a natural end (checkmate/stalemate/ply cap) with no pipeline crash
Legal-move rate (first try)258/731 (35.3%) — over a third of the model's move proposals are legal in the current position on the very first sample, with no resampling

What 35.3% means here

This is not a chess-strength number — win rate against the opponent is explicitly not part of this project's release gate. It's a legality-learning number: roughly one in three raw samples from the model, with no rejection sampling applied, land on a real legal move in the actual current position. The other two-thirds are the dream — syntactically valid UCI strings (e2e4-shaped) that are illegal here, rendered rather than discarded by the harness's resample loop.

Limitations

How to reproduce

cd projects/daydream/models/daydream-chess-nanogpt-1
python fetch_filtered.py      # -> games.txt (network; Lichess Jan 2018 dump)
python prepare.py             # -> regular/{train,val}.bin + meta.pkl
python train.py config.py     # -> ./ckpt.pt (3000 iters, val ~0.86)
python harness.py --games 30  # verification: legal-move rate, clean-completion rate

Experiment write-up: Can a chess model's illegal moves be the point?

Requires Fairy-Stockfish on PATH (brew install fairy-stockfish) for harness.py — see ADR-0021.

Citation / credits