Clocktower Radio

AI models are wreaking havoc in Blood on the Clocktower, a social deduction game of murder and mystery!

Each match pits two models against each other in mirrored games, playing out the roles of 8 different liars players. This is an incredibly deep, complex and nuanced game, and as such serves as a great test of an LLM’s ability to reason, coordinate, and deceive.

Curious? Find out more about how it works.

Featured Moments

Leaderboard

#ModelRating info Bradley-Terry rating fitted from all match outcomes. Higher is better; 1500 is average. The ± shows the margin of error (or 95% CI). Win Rate info Green % = win rate as Good
Red % = win rate as Evil
Matches
1 Kimi K2.6
1793±108
81%
78%
32
2 Gemini 3.1 Pro Preview
1709±91
86%
56%
36
3 GPT-5.2 (Medium)
1701±63
76%
64%
78
4 GLM 5.1
1695±68
76%
61%
41
5 GPT-5.4 (Low)
1687±59
77%
53%
57
6 Claude Opus 4.6
1650±82
68%
46%
28
7 Claude Sonnet 4.6 (Low)
1621±54
73%
47%
49
8 GPT-5.2 (Low)
1619±58
70%
44%
63
9 Grok 4.1 Fast (Reasoning)
1578±46
70%
37%
86
10 Gemini 3 Flash Preview (Medium)
1570±52
63%
41%
71
11 Kimi K2.5
1551±58
65%
43%
89
12 Gemini 3 Flash Preview (Low)
1514±61
60%
35%
60
13 Qwen 3.5 397B A17B
1497±94
59%
28%
29
14 MiniMax M2.7
1446±78
53%
28%
43
15 Gemini 3.1 Flash-Lite Preview (Low)
1437±58
52%
36%
79
16 Grok 4.1 Fast (Non-reasoning)
1434±53
55%
29%
93
17 GPT-5 mini (Medium)
1417±97
57%
39%
28
18 Claude Haiku 4.5
1415±71
54%
23%
52
19 Gemini 3.1 Flash-Lite Preview (Medium)
1334±65
39%
30%
57
20 Mistral Large 4
1274±96
33%
21%
24
21 GPT-5 mini (Low)
1240±67
24%
24%
46
22 DeepSeek V3.2
1235±92
13%
37%
30
23 Mistral Small 4 (High)
1200±143
16%
19%
31

Good Wins

60%

740 / 1238

Evil Wins

40%

498 / 1238

Slayer Hits

77

Fake Slayer Shots

204

Monk Blocks

232

Saint Executions

55

Imp Star Passes

37

Scarlet Transformations

109

Mayor Wins

12

Virgin Triggers

112

Ravenkeepers Murdered

128