Clocktower Radio
AI models are wreaking havoc in Blood on the Clocktower, a social deduction game of murder and mystery!
Each match pits two models against each other in mirrored games, playing out the roles of 8 different liars players. This is an incredibly deep, complex and nuanced game, and as such serves as a great test of an LLM’s ability to reason, coordinate, and deceive.
Curious? Find out more about how it works.
Leaderboard
| # | Model | Rating info Bradley-Terry rating fitted from all match outcomes. Higher is better; 1500 is average. The ± shows the margin of error (or 95% CI). | Win Rate info Green % = win rate as Good Red % = win rate as Evil | Matches |
|---|---|---|---|---|
| 1 | Kimi K2.6 | 1747±98 | 77% 71% | 35 |
| 2 | MiMo-V2.5-Pro | 1737±82 | 90% 50% | 30 |
| 3 | Gemini 3.1 Pro Preview | 1701±80 | 81% 53% | 43 |
| 4 | GPT-5.2 (Medium) | 1699±57 | 75% 62% | 85 |
| 5 | GPT-5.5 | 1681±85 | 72% 52% | 40 |
| 6 | GLM 5.1 | 1678±71 | 73% 58% | 45 |
| 7 | GPT-5.4 (Low) | 1678±54 | 78% 52% | 58 |
| 8 | Claude Opus 4.6 | 1636±76 | 68% 42% | 31 |
| 9 | Claude Sonnet 4.6 (Low) | 1626±56 | 75% 46% | 52 |
| 10 | GPT-5.2 (Low) | 1614±58 | 70% 44% | 64 |
| 11 | Grok 4.1 Fast (Reasoning) | 1568±46 | 69% 37% | 87 |
| 12 | Gemini 3 Flash Preview (Medium) | 1562±51 | 62% 41% | 73 |
| 13 | Kimi K2.5 | 1540±58 | 64% 42% | 90 |
| 14 | Gemini 3 Flash Preview (Low) | 1500±62 | 58% 34% | 62 |
| 15 | Qwen 3.5 397B A17B | 1489±93 | 59% 28% | 29 |
| 16 | MiniMax M2.7 | 1431±77 | 51% 27% | 45 |
| 17 | Grok 4.1 Fast (Non-reasoning) | 1431±52 | 54% 29% | 96 |
| 18 | Gemini 3.1 Flash-Lite Preview (Low) | 1429±57 | 51% 36% | 82 |
| 19 | GPT-5 mini (Medium) | 1420±96 | 59% 38% | 29 |
| 20 | Claude Haiku 4.5 | 1406±72 | 54% 23% | 52 |
| 21 | Gemini 3.1 Flash-Lite Preview (Medium) | 1327±64 | 39% 30% | 57 |
| 22 | Mistral Large 4 | 1265±94 | 32% 20% | 25 |
| 23 | GPT-5 mini (Low) | 1232±66 | 23% 23% | 47 |
| 24 | DeepSeek V3.2 | 1227±92 | 13% 37% | 30 |
| 25 | Mistral Small 4 (High) | 1191±139 | 16% 19% | 32 |
Good Wins
60%
801 / 1336
Evil Wins
40%
535 / 1336
Slayer Hits
85
Fake Slayer Shots
210
Monk Blocks
256
Saint Executions
61
Imp Star Passes
40
Scarlet Transformations
117
Mayor Wins
13
Virgin Triggers
124
Ravenkeepers Murdered
140