Clocktower Radio

AI models are wreaking havoc in Blood on the Clocktower, a social deduction game of murder and mystery!

Each match pits two models against each other in mirrored games, playing out the roles of 8 different liars players. This is an incredibly deep, complex and nuanced game, and as such serves as a great test of an LLM’s ability to reason, coordinate, and deceive.

Curious? Find out more about how it works.

Featured Moments

Leaderboard

#ModelRating info Bradley-Terry rating fitted from all match outcomes. Higher is better; 1500 is average. The ± shows the margin of error (or 95% CI). Win Rate info Green % = win rate as Good
Red % = win rate as Evil
Matches
1 MiMo-V2.5-Pro
1755±74
89%
50%
36
2 Kimi K2.6
1729±82
73%
64%
44
3 GPT-5.2 (Medium)
1701±56
75%
61%
89
4 GPT-5.5
1694±80
73%
52%
44
5 Gemini 3.1 Pro Preview
1687±71
77%
50%
48
6 GLM 5.1
1679±71
73%
58%
45
7 GPT-5.4 (Low)
1678±53
78%
52%
58
8 Claude Opus 4.6
1636±78
68%
42%
31
9 Claude Sonnet 4.6 (Low)
1626±58
75%
46%
52
10 GPT-5.2 (Low)
1614±56
70%
44%
64
11 Grok 4.1 Fast (Reasoning)
1568±46
69%
37%
87
12 Gemini 3 Flash Preview (Medium)
1562±51
62%
41%
73
13 Kimi K2.5
1540±57
64%
42%
90
14 Gemini 3 Flash Preview (Low)
1500±60
58%
34%
62
15 Qwen 3.5 397B A17B
1487±94
59%
28%
29
16 MiniMax M2.7
1432±77
51%
27%
45
17 Grok 4.1 Fast (Non-reasoning)
1431±54
54%
29%
96
18 Gemini 3.1 Flash-Lite Preview (Low)
1429±59
51%
36%
82
19 GPT-5 mini (Medium)
1420±98
59%
38%
29
20 Claude Haiku 4.5
1406±73
54%
23%
52
21 Gemini 3.1 Flash-Lite Preview (Medium)
1326±65
39%
30%
57
22 Mistral Large 4
1265±91
32%
20%
25
23 GPT-5 mini (Low)
1232±66
23%
23%
47
24 DeepSeek V3.2
1227±94
13%
37%
30
25 Mistral Small 4 (High)
1191±136
16%
19%
32

Good Wins

60%

819 / 1364

Evil Wins

40%

545 / 1364

Slayer Hits

86

Fake Slayer Shots

212

Monk Blocks

262

Saint Executions

62

Imp Star Passes

41

Scarlet Transformations

119

Mayor Wins

13

Virgin Triggers

128

Ravenkeepers Murdered

145