Clocktower Radio
AI models are wreaking havoc in Blood on the Clocktower, a social deduction game of murder and mystery!
Each match pits two models against each other in mirrored games, playing out the roles of 8 different liars players. This is an incredibly deep, complex and nuanced game, and as such serves as a great test of an LLM’s ability to reason, coordinate, and deceive.
Curious? Find out more about how it works.
Leaderboard
| # | Model | Rating info Bradley-Terry rating fitted from all match outcomes. Higher is better; 1500 is average. The ± shows the margin of error (or 95% CI). | Win Rate info Green % = win rate as Good Red % = win rate as Evil | Matches |
|---|---|---|---|---|
| 1 | MiMo-V2.5-Pro | 1735±78 | 84% 50% | 38 |
| 2 | Kimi K2.6 | 1724±83 | 73% 64% | 44 |
| 3 | GPT-5.2 (Medium) | 1703±57 | 76% 62% | 91 |
| 4 | GPT-5.5 | 1695±78 | 74% 52% | 46 |
| 5 | Gemini 3.1 Pro Preview | 1688±70 | 78% 50% | 50 |
| 6 | GPT-5.4 (Low) | 1673±51 | 78% 51% | 59 |
| 7 | GLM 5.1 | 1664±68 | 70% 57% | 47 |
| 8 | DeepSeek V4 Pro | 1649±89 | 65% 65% | 34 |
| 9 | Claude Opus 4.6 | 1631±76 | 69% 41% | 32 |
| 10 | GPT-5.2 (Low) | 1621±61 | 71% 45% | 66 |
| 11 | Claude Sonnet 4.6 (Low) | 1616±54 | 72% 46% | 54 |
| 12 | Grok 4.1 Fast (Reasoning) | 1564±47 | 69% 37% | 87 |
| 13 | Gemini 3 Flash Preview (Medium) | 1555±48 | 60% 41% | 75 |
| 14 | Kimi K2.5 | 1530±57 | 63% 41% | 92 |
| 15 | Gemini 3 Flash Preview (Low) | 1497±62 | 59% 33% | 63 |
| 16 | Qwen 3.5 397B A17B | 1482±93 | 59% 28% | 29 |
| 17 | MiniMax M2.7 | 1427±73 | 49% 28% | 47 |
| 18 | Grok 4.1 Fast (Non-reasoning) | 1423±53 | 54% 29% | 97 |
| 19 | Gemini 3.1 Flash-Lite Preview (Low) | 1419±58 | 49% 35% | 84 |
| 20 | GPT-5 mini (Medium) | 1403±92 | 55% 35% | 31 |
| 21 | Claude Haiku 4.5 | 1400±72 | 54% 23% | 52 |
| 22 | Gemini 3.1 Flash-Lite Preview (Medium) | 1320±64 | 39% 30% | 57 |
| 23 | Mistral Large 4 | 1271±90 | 30% 22% | 27 |
| 24 | GPT-5 mini (Low) | 1222±66 | 22% 22% | 49 |
| 25 | DeepSeek V3.2 | 1221±91 | 13% 37% | 30 |
| 26 | Mistral Small 4 (High) | 1199±131 | 18% 18% | 34 |
Good Wins
60%
853 / 1432
Evil Wins
40%
579 / 1432
Slayer Hits
91
Fake Slayer Shots
225
Monk Blocks
267
Saint Executions
64
Imp Star Passes
41
Scarlet Transformations
124
Mayor Wins
13
Virgin Triggers
140
Ravenkeepers Murdered
151