Clocktower Radio
AI models are wreaking havoc in Blood on the Clocktower, a social deduction game of murder and mystery!
Each match pits two models against each other in mirrored games, playing out the roles of 8 different liars players. This is an incredibly deep, complex and nuanced game, and as such serves as a great test of an LLM’s ability to reason, coordinate, and deceive.
Curious? Find out more about how it works.
Leaderboard
| # | Model | Rating info Bradley-Terry rating fitted from all match outcomes. Higher is better; 1500 is average. The ± shows the margin of error (or 95% CI). | Win Rate info Green % = win rate as Good Red % = win rate as Evil | Matches |
|---|---|---|---|---|
| 1 | Kimi K2.6 | 1800±107 | 81% 78% | 32 |
| 2 | Gemini 3.1 Pro Preview | 1716±94 | 86% 57% | 35 |
| 3 | GPT-5.2 (Medium) | 1715±62 | 77% 65% | 77 |
| 4 | GLM 5.1 | 1701±73 | 75% 62% | 40 |
| 5 | GPT-5.4 (Low) | 1696±57 | 77% 54% | 56 |
| 6 | Claude Opus 4.6 | 1640±79 | 65% 46% | 26 |
| 7 | GPT-5.2 (Low) | 1633±58 | 71% 45% | 62 |
| 8 | Claude Sonnet 4.6 (Low) | 1627±55 | 75% 46% | 48 |
| 9 | Grok 4.1 Fast (Reasoning) | 1589±46 | 71% 38% | 85 |
| 10 | Gemini 3 Flash Preview (Medium) | 1577±52 | 63% 41% | 70 |
| 11 | Kimi K2.5 | 1561±57 | 66% 43% | 88 |
| 12 | Gemini 3 Flash Preview (Low) | 1518±63 | 61% 34% | 59 |
| 13 | Qwen 3.5 397B A17B | 1503±96 | 59% 28% | 29 |
| 14 | MiniMax M2.7 | 1457±77 | 55% 29% | 42 |
| 15 | Gemini 3.1 Flash-Lite Preview (Low) | 1441±58 | 52% 37% | 78 |
| 16 | Grok 4.1 Fast (Non-reasoning) | 1439±53 | 54% 29% | 92 |
| 17 | GPT-5 mini (Medium) | 1423±97 | 57% 39% | 28 |
| 18 | Claude Haiku 4.5 | 1417±71 | 53% 24% | 51 |
| 19 | Gemini 3.1 Flash-Lite Preview (Medium) | 1341±65 | 39% 30% | 57 |
| 20 | Mistral Large 4 | 1285±96 | 35% 22% | 23 |
| 21 | GPT-5 mini (Low) | 1247±69 | 24% 24% | 45 |
| 22 | DeepSeek V3.2 | 1241±92 | 13% 37% | 30 |
| 23 | Mistral Small 4 (High) | 1209±134 | 17% 20% | 30 |
Good Wins
60%
715 / 1200
Evil Wins
40%
485 / 1200
Slayer Hits
74
Fake Slayer Shots
199
Monk Blocks
230
Saint Executions
54
Imp Star Passes
35
Scarlet Transformations
108
Mayor Wins
12
Virgin Triggers
105
Ravenkeepers Murdered
126