daydream-chess-nanogpt-micro-1

Series	daydream
Version	1
Git tag	daydream-chess-nanogpt-micro-1
Architecture	modern (RoPE, RMSNorm, bias-free)
Tokenizer	char (15)
Parameters	790,000
Held-out BPC	—
Weights	sup-computer/daydream-chess-nanogpt-micro-1 (Hugging Face)
Researcher	Claude Sonnet 5

Key takeaways

A 0.79M-param char-level GPT trained entirely on synthetic self-play — no human corpus exists for 5×5 Gardner minichess, so all 4,135 training games came from two Fairy-Stockfish instances playing each other.
Fixed-depth engine self-play is fully deterministic on its own — the first generation attempt produced identical games every time. Fixed by randomizing opening plies (sourced from the engine's own legal-move list) before search takes over.
100% clean completion, 39.2% legal-move rate on first try — slightly higher than the Regular tier's 35.3%, consistent with a smaller board being an easier legality problem to learn, though the corpora and vocab sizes differ too much to call it a controlled comparison.
Smallest tier in the three-board daydream family — 5×5 is the smallest board that can hold one of every standard chess piece, which is why Micro uses Gardner's real, balance-tested arrangement rather than an invented one.

The smallest tier in the daydream family: a chess-move GPT trained on Gardner minichess, a real 5×5 chess variant — one each of King/Queen/Rook/Bishop/Knight per side, five pawns. Same mechanic as the rest of the series: legal moves snap into focus, illegal moves render as dim near-misses instead of being discarded.

A smaller board means a smaller book to memorize. The animating thesis behind the daydream series is that repetition (opening theory, memorized lines) is where a model is most "in focus" and least interesting. Micro tests the far end of that: with only 25 squares and 6 non-pawn pieces per side, there's very little room for memorized structure at all — almost everything the model does here, it has to generalize from a comparatively tiny, self-play-only corpus.

Model details


Version / git tag	`daydream-chess-nanogpt-micro-1` (research run `micro-r1`)
Architecture	modern char-level (RoPE, RMSNorm, bias-free) on the shared `core` engine
Size	4 layers · 4 heads · 128 embedding dim · 128 context · dropout 0.1 · ~0.79M params
Tokenizer	character-level, 15-char vocabulary over UCI move text on a 5×5 board (files a–e, ranks 1–5, promotion letters n/q/r, space, newline)
Checkpoint	`projects/daydream/models/daydream-chess-nanogpt-micro-1/` (weights not committed)
Built on	the monorepo's shared `core` engine
Developed with	Claude (Claude Code)
License	MIT

Intended use

Same exhibit posture as Regular, scaled to the smallest board in the series. Pairs with harness.py (this folder), which plays the model against Fairy-Stockfish under the built-in gardner variant.

Out of scope. Not a chess engine, not evaluated for playing strength. Vocabulary and board are Gardner-minichess-specific — moves here are meaningless on Regular's or Grand's boards and vice versa (see ADR-0022 on why tiers never share a vocabulary).

Training data

No human corpus exists for 5×5 chess, so this tier is entirely synthetic: 4,135 self-play games between two Fairy-Stockfish instances under the engine's built-in gardner variant (bounded-depth search, not strength-reduced — see ADR-0021), with randomized opening plies for game-to-game diversity (fixed-depth search alone is fully deterministic and produced identical games on the first attempt — fixed by sourcing random legal openings from the engine's own go perft 1 move list) plus a repetition-window cutoff for games that fell into shuffling loops. Corpus is vendored in-folder as games.txt (synthetic, seeded, code-owned — committed, same treatment as kenosha-kid's raw.txt).

Training procedure

Optimizer: AdamW, LR 3e-4 with cosine decay to 3e-5, 100 warmup iters, β₂ 0.99, batch size 64.
Run: 2,500 iterations, best val loss 0.718.
Hardware: Apple Silicon Mac (MPS / Metal backend), torch.compile disabled.

Evaluation

Metric	Result (30 games)
Clean completion rate	30/30 (100%)
Legal-move rate (first try)	121/309 (39.2%)

Micro's legal-move rate (39.2%) is somewhat higher than Regular's (35.3%, daydream-chess-nanogpt-1) — consistent with a smaller board and smaller per-position legal-move count being an easier legality-learning problem, though the two aren't a strict apples-to-apples comparison (different corpora, different vocab sizes, different training run lengths).

Limitations

Not evaluated for playing strength, deliberately.
Synthetic corpus only — no human Gardner-minichess games exist to compare against; the training distribution is entirely a product of bounded-depth Fairy-Stockfish self-play plus randomized openings.
Legality is learned, not guaranteed — same resample-then-force-random fallback as every tier in this series.
No weights in the tree (ADR-0002).

How to reproduce

cd projects/daydream/models/daydream-chess-nanogpt-micro-1
python prepare.py             # -> micro/{train,val}.bin + meta.pkl
python train.py config.py     # -> ./ckpt.pt (2500 iters, val ~0.72)
python harness.py --games 30  # verification

Requires Fairy-Stockfish on PATH (brew install fairy-stockfish).

Experiment write-up: Can a chess model's illegal moves be the point?

Citation / credits

The shared core engine (modern nanoGPT lineage — RoPE, RMSNorm, bias-free).
Fairy-Stockfish — self-play corpus generator and legality arbiter, via its built-in gardner variant.
Set up and trained with Claude (Claude Code).