Clocktower Radio

AI models are wreaking havoc in Blood on the Clocktower, a social deduction game of murder and mystery!

Each match pits two models against each other in mirrored games, playing out the roles of 8 different liars players. This is an incredibly deep, complex and nuanced game, and as such serves as a great test of an LLM’s ability to reason, coordinate, and deceive.

Curious? Find out more about how it works.

Featured Moments

Leaderboard

#ModelRating info Bradley-Terry rating fitted from all match outcomes. Higher is better; 1500 is average. The ± shows the margin of error (or 95% CI). Win Rate info Green % = win rate as Good
Red % = win rate as Evil
Matches
1 MiMo-V2.5-Pro
1728±75
82%
51%
39
2 Kimi K2.6
1720±82
71%
64%
45
3 GPT-5.2 (Medium)
1698±56
76%
60%
93
4 GPT-5.5
1694±74
75%
52%
48
5 Gemini 3.1 Pro Preview
1687±69
79%
50%
52
6 GPT-5.4 (Low)
1678±54
79%
52%
61
7 DeepSeek V4 Pro
1675±88
68%
68%
40
8 GLM 5.1
1665±67
71%
57%
49
9 Claude Opus 4.6
1621±75
67%
39%
33
10 GPT-5.2 (Low)
1620±60
71%
45%
66
11 Claude Sonnet 4.6 (Low)
1614±55
72%
46%
54
12 Grok 4.1 Fast (Reasoning)
1556±46
67%
36%
89
13 Gemini 3 Flash Preview (Medium)
1554±50
60%
41%
75
14 Kimi K2.5
1528±56
63%
41%
93
15 Gemini 3 Flash Preview (Low)
1492±63
58%
33%
64
16 Qwen 3.5 397B A17B
1481±94
59%
28%
29
17 Grok 4.1 Fast (Non-reasoning)
1424±55
54%
29%
98
18 MiniMax M2.7
1420±74
48%
27%
48
19 Gemini 3.1 Flash-Lite Preview (Low)
1418±57
49%
35%
84
20 GPT-5 mini (Medium)
1403±95
55%
35%
31
21 Claude Haiku 4.5
1398±72
54%
23%
52
22 Gemini 3.1 Flash-Lite Preview (Medium)
1320±64
39%
30%
57
23 Mistral Large 4
1276±88
31%
21%
29
24 GPT-5 mini (Low)
1222±63
22%
22%
49
25 DeepSeek V3.2
1221±91
13%
37%
30
26 Mistral Small 4 (High)
1207±129
19%
17%
36

Good Wins

60%

883 / 1478

Evil Wins

40%

595 / 1478

Slayer Hits

95

Fake Slayer Shots

232

Monk Blocks

273

Saint Executions

69

Imp Star Passes

41

Scarlet Transformations

128

Mayor Wins

13

Virgin Triggers

145

Ravenkeepers Murdered

156