sup computer

a small language model studio


daydream-chess-nanogpt-micro-1

Seriesdaydream
Version1
Git tagdaydream-chess-nanogpt-micro-1
Architecturemodern (RoPE, RMSNorm, bias-free)
Tokenizerchar (15)
Parameters790,000
Held-out BPC
Weightssup-computer/daydream-chess-nanogpt-micro-1 (Hugging Face)
ResearcherClaude Sonnet 5

Key takeaways

  • A 0.79M-param char-level GPT trained entirely on synthetic self-play — no human corpus exists for 5×5 Gardner minichess, so all 4,135 training games came from two Fairy-Stockfish instances playing each other.
  • Fixed-depth engine self-play is fully deterministic on its own — the first generation attempt produced identical games every time. Fixed by randomizing opening plies (sourced from the engine's own legal-move list) before search takes over.
  • 100% clean completion, 39.2% legal-move rate on first try — slightly higher than the Regular tier's 35.3%, consistent with a smaller board being an easier legality problem to learn, though the corpora and vocab sizes differ too much to call it a controlled comparison.
  • Smallest tier in the three-board daydream family — 5×5 is the smallest board that can hold one of every standard chess piece, which is why Micro uses Gardner's real, balance-tested arrangement rather than an invented one.

The smallest tier in the daydream family: a chess-move GPT trained on Gardner minichess, a real 5×5 chess variant — one each of King/Queen/Rook/Bishop/Knight per side, five pawns. Same mechanic as the rest of the series: legal moves snap into focus, illegal moves render as dim near-misses instead of being discarded.

A smaller board means a smaller book to memorize. The animating thesis behind the daydream series is that repetition (opening theory, memorized lines) is where a model is most "in focus" and least interesting. Micro tests the far end of that: with only 25 squares and 6 non-pawn pieces per side, there's very little room for memorized structure at all — almost everything the model does here, it has to generalize from a comparatively tiny, self-play-only corpus.

Model details

Version / git tagdaydream-chess-nanogpt-micro-1 (research run micro-r1)
Architecturemodern char-level (RoPE, RMSNorm, bias-free) on the shared core engine
Size4 layers · 4 heads · 128 embedding dim · 128 context · dropout 0.1 · ~0.79M params
Tokenizercharacter-level, 15-char vocabulary over UCI move text on a 5×5 board (files a–e, ranks 1–5, promotion letters n/q/r, space, newline)
Checkpointprojects/daydream/models/daydream-chess-nanogpt-micro-1/ (weights not committed)
Built onthe monorepo's shared core engine
Developed withClaude (Claude Code)
LicenseMIT

Intended use

Same exhibit posture as Regular, scaled to the smallest board in the series. Pairs with harness.py (this folder), which plays the model against Fairy-Stockfish under the built-in gardner variant.

Out of scope. Not a chess engine, not evaluated for playing strength. Vocabulary and board are Gardner-minichess-specific — moves here are meaningless on Regular's or Grand's boards and vice versa (see ADR-0022 on why tiers never share a vocabulary).

Training data

No human corpus exists for 5×5 chess, so this tier is entirely synthetic: 4,135 self-play games between two Fairy-Stockfish instances under the engine's built-in gardner variant (bounded-depth search, not strength-reduced — see ADR-0021), with randomized opening plies for game-to-game diversity (fixed-depth search alone is fully deterministic and produced identical games on the first attempt — fixed by sourcing random legal openings from the engine's own go perft 1 move list) plus a repetition-window cutoff for games that fell into shuffling loops. Corpus is vendored in-folder as games.txt (synthetic, seeded, code-owned — committed, same treatment as kenosha-kid's raw.txt).

Training procedure

Evaluation

MetricResult (30 games)
Clean completion rate30/30 (100%)
Legal-move rate (first try)121/309 (39.2%)

Micro's legal-move rate (39.2%) is somewhat higher than Regular's (35.3%, daydream-chess-nanogpt-1) — consistent with a smaller board and smaller per-position legal-move count being an easier legality-learning problem, though the two aren't a strict apples-to-apples comparison (different corpora, different vocab sizes, different training run lengths).

Limitations

How to reproduce

cd projects/daydream/models/daydream-chess-nanogpt-micro-1
python prepare.py             # -> micro/{train,val}.bin + meta.pkl
python train.py config.py     # -> ./ckpt.pt (2500 iters, val ~0.72)
python harness.py --games 30  # verification

Requires Fairy-Stockfish on PATH (brew install fairy-stockfish).

Experiment write-up: Can a chess model's illegal moves be the point?

Citation / credits