daydream-chess-nanogpt-1

Series	daydream
Version	1
Git tag	daydream-chess-nanogpt-1
Architecture	modern (RoPE, RMSNorm, bias-free)
Tokenizer	char (21)
Parameters	2,660,000
Held-out BPC	—
Weights	sup-computer/daydream-chess-nanogpt-1 (Hugging Face)
Researcher	Claude Sonnet 5

Key takeaways

A 2.66M-param char-level GPT that renders illegal chess moves as dim near-misses instead of masking them away — the sampler's resample-until-legal loop is the artwork's mechanism, not a correctness patch on top of it.
Trained on 15,000 real Lichess games, deliberately mid-rating (~1400–1800 Elo) rather than elite play — too-clean games were rejected as a corpus source because they kill the near-miss texture the dream mechanic needs.
The release gate is not win rate. It's two automated checks: 100% clean game completion and a 35.3% legal-move rate on the model's raw, unresampled first try — a genuine legality-learning signal, not a strength claim.
First of three tiers in the daydream family (Micro 5×5, Regular 8×8, Grand 12×10) — the first project in this monorepo with an external non-Python engine dependency, Fairy-Stockfish.

A chess-move GPT that doesn't play chess so much as hallucinate it. Legal moves snap into focus; illegal moves render as dim near-misses instead of being masked or rejection-sampled away — the sampler is the aesthetic decision, not a correctness filter. First release in the daydream series, and the standard 8×8 board tier — "Regular," the implicit default among Micro (5×5) and Grand (12 files × 10 ranks).

Render illegal moves, don't discard them. Most chess-model work treats illegal output as failure to be masked or rejection-sampled away. Daydream inverts that: a candidate move is either legal (it snaps into focus and becomes the game's actual move) or illegal (a rejected dream, kept rather than thrown away). The sampler's resample-until-legal loop is the artwork's mechanism, not a bug-fix on top of it.

Model details


Version / git tag	`daydream-chess-nanogpt-1` (research run `regular-r1`)
Architecture	modern char-level (RoPE, RMSNorm, bias-free) on the shared `core` engine — no vendored base engine
Size	6 layers · 6 heads · 192 embedding dim · 256 context · dropout 0.1 · ~2.66M params
Tokenizer	character-level, 21-char vocabulary over UCI move text (files a–h, ranks 1–8, promotion letters q/r/b/n, space, newline) — `meta.pkl` is the contract (ADR-0012)
Checkpoint	`projects/daydream/models/daydream-chess-nanogpt-1/` (weights not committed — regenerates deterministically below)
Built on	the monorepo's shared `core` engine
Developed with	Claude (Claude Code)
License	MIT

Intended use

An exhibit exploring what a chess-move model looks like when illegal output is rendered rather than hidden. Not intended to play strong chess — legality and interestingness of the near-misses are the point, not playing strength. Pairs with harness.py, which plays the model against Fairy-Stockfish, resampling on illegal moves until one lands (or forcing a random legal move once a resample cap is hit, so games always complete).

Out of scope. Not a chess engine and not evaluated as one — no win-rate claims are made or should be inferred. Not a general-purpose language model; its entire vocabulary is UCI chess-move syntax.

Training data

15,000 games from the Lichess open database (January 2018 monthly dump), filtered to games where both players are rated 1400–1800 Elo — deliberately mid-band: strong enough for coherent positional shape, loose enough to produce the near-miss texture the dream mechanic needs. Elite/engine games were explicitly rejected as a corpus source elsewhere in this project's design — too-clean play kills the texture. Streamed and filtered directly from the compressed dump (zstd decode + Elo filter in one pass via python-chess) without ever storing the full ~5.5GB file. Converted from SAN to UCI move notation. Corpus is downloaded, not committed (regenerates via fetch_filtered.py); only derived artifacts (*.bin, *.pkl, *.pt) are gitignored.

Training procedure

Optimizer: AdamW, LR 3e-4 with cosine decay to 3e-5, 100 warmup iters, β₂ 0.99, batch size 64.
Run: 3,000 iterations, best val loss 0.858 (always_save_checkpoint).
Hardware: Apple Silicon Mac (MPS / Metal backend), torch.compile disabled.

Evaluation

Verification runs harness.py: the model plays full games against a skill-limited Fairy-Stockfish opponent, resampling on illegal moves. The two automated gate metrics — nothing else is an automated gate, per this project's design:

Metric	Result (30 games)
Clean completion rate	30/30 (100%) — every game reached a natural end (checkmate/stalemate/ply cap) with no pipeline crash
Legal-move rate (first try)	258/731 (35.3%) — over a third of the model's move proposals are legal in the current position on the very first sample, with no resampling

What 35.3% means here

This is not a chess-strength number — win rate against the opponent is explicitly not part of this project's release gate. It's a legality-learning number: roughly one in three raw samples from the model, with no rejection sampling applied, land on a real legal move in the actual current position. The other two-thirds are the dream — syntactically valid UCI strings (e2e4-shaped) that are illegal here, rendered rather than discarded by the harness's resample loop.

Limitations

Not evaluated for playing strength, deliberately — this project's gate is legality and completion, not win rate.
Legality is learned, not guaranteed. Even the harness's resample loop has a cap; beyond it, a uniformly random legal move is forced so the game still completes. That fallback move is not the model's "dream" — it's the harness's safety valve.
UCI-only vocabulary. No SAN, no natural language, no commentary — 21 characters, chess moves and nothing else.
No weights in the tree (ADR-0002).

How to reproduce

cd projects/daydream/models/daydream-chess-nanogpt-1
python fetch_filtered.py      # -> games.txt (network; Lichess Jan 2018 dump)
python prepare.py             # -> regular/{train,val}.bin + meta.pkl
python train.py config.py     # -> ./ckpt.pt (3000 iters, val ~0.86)
python harness.py --games 30  # verification: legal-move rate, clean-completion rate

Experiment write-up: Can a chess model's illegal moves be the point?

Requires Fairy-Stockfish on PATH (brew install fairy-stockfish) for harness.py — see ADR-0021.

Citation / credits

The shared core engine (modern nanoGPT lineage — RoPE, RMSNorm, bias-free).
Fairy-Stockfish — legality arbiter and self-play engine for the sibling Micro/Grand tiers.
The Lichess open database — corpus source.
Set up and trained with Claude (Claude Code).