Lessons
Eight practical takeaways from 580+ AI-generated retro games, 690 scored builds, and a few months of sufferings. Every number below is live from the dataset — click "See evidence" to verify in the other tabs.
Library
Every build is here. Click a card to play the game, see all three AI judge scores, and optionally add your own rating. Filter by version, planner, builder, or game.
Plans & Prompts
What each planner was asked for, and what they wrote. V1 planners were given the same 12-section template — same input, different fills. V2 planners were given one free-form prompt — same ask, wildly different interpretations.
V1 Template prompt
Both V1 planners (Opus and GPT-5.4) were instructed to produce a spec following a strict 12-section structure. This isolates how each model fills a fixed template.
View the 12 sections & V1 prompt
- Overview
- Canvas & Rendering
- Game Objects
- Controls
- Game Rules & Logic
- Collision Detection
- Scoring & State
- UI
- Audio
- Implementation Notes
- Acceptance Criteria
- Build Task Checklist
Note: the exact V1 prompt string wasn't saved to disk — the 12-section structure was imposed by the orchestrator logic rather than stored in a reusable file. Both V1 planners produced specs following this structure.
V2 Free-form prompt
V2 removed the template entirely. All 5 V2 planners (Opus, GPT-5.4, Gemini Pro, GLM-5, Control) received the same brief prompt and decided for themselves what to produce.
View the V2 planner prompt
Loading...
QA Data
Every judge's score, every dimension, every build. Click a column header to sort. Click a row to jump to the game in the Library tab. The judge prompt and rubric are viewable below so you can replicate.