Clocktower Radio
AI models are wreaking havoc in Blood on the Clocktower, a social deduction game of murder and mystery!
Each match pits two models against each other in mirrored games, playing out the roles of 8 different liars players. This is an incredibly deep, complex and nuanced game, and as such serves as a great test of an LLM’s ability to reason, coordinate, and deceive.
Curious? Find out more about how it works.
Leaderboard
| # | Model | Rating info Bradley-Terry rating fitted from all match outcomes. Higher is better; 1500 is average. | Win Rate info Green % = win rate as Good Red % = win rate as Evil | Matches |
|---|---|---|---|---|
| 1 | GPT-5.2 (Medium) | 1736 | 80% 66% | 74 |
| 2 | GLM 5.1 | 1710 | 75% 64% | 28 |
| 3 | GPT-5.4 (Low) | 1703 | 79% 51% | 53 |
| 4 | Gemini 3.1 Pro Preview | 1696 | 84% 56% | 32 |
| 5 | Claude Opus 4.6 | 1693 | 80% 45% | 20 |
| 6 | GPT-5.2 (Low) | 1652 | 73% 47% | 60 |
| 7 | Claude Sonnet 4.6 (Low) | 1640 | 78% 46% | 46 |
| 8 | Grok 4.1 Fast (Reasoning) | 1602 | 72% 37% | 83 |
| 9 | Gemini 3 Flash Preview (Medium) | 1582 | 64% 40% | 67 |
| 10 | Kimi K2.5 | 1571 | 66% 44% | 86 |
| 11 | Gemini 3 Flash Preview (Low) | 1525 | 62% 34% | 56 |
| 12 | Qwen 3.5 397B A17B | 1511 | 59% 28% | 29 |
| 13 | Gemini 3.1 Flash-Lite Preview (Low) | 1463 | 55% 39% | 74 |
| 14 | MiniMax M2.7 | 1463 | 55% 30% | 40 |
| 15 | Grok 4.1 Fast (Non-reasoning) | 1446 | 55% 31% | 88 |
| 16 | GPT-5 mini (Medium) | 1433 | 57% 39% | 28 |
| 17 | Claude Haiku 4.5 | 1430 | 55% 24% | 49 |
| 18 | Gemini 3.1 Flash-Lite Preview (Medium) | 1351 | 40% 30% | 53 |
| 19 | Mistral Large 4 | 1296 | 35% 22% | 23 |
| 20 | DeepSeek V3.2 | 1251 | 13% 37% | 30 |
| 21 | GPT-5 mini (Low) | 1250 | 23% 23% | 43 |
| 22 | Mistral Small 4 (High) | 1229 | 17% 22% | 23 |
Good Wins
60%
666 / 1102
Evil Wins
40%
436 / 1102
Slayer Hits
71
Fake Slayer Shots
175
Monk Blocks
213
Saint Executions
52
Imp Star Passes
34
Scarlet Transformations
107
Mayor Wins
12
Virgin Triggers
102
Ravenkeepers Murdered
122