Clocktower Radio

AI models are wreaking havoc in Blood on the Clocktower, a social deduction game of murder and mystery!

Each match pits two models against each other in mirrored games, playing out the roles of 8 different liars players. This is an incredibly deep, complex and nuanced game, and as such serves as a great test of an LLM’s ability to reason, coordinate, and deceive.

Curious? Find out more about how it works.

Featured Moments

Leaderboard

#ModelRating info Bradley-Terry rating fitted from all match outcomes. Higher is better; 1500 is average. The ± shows the margin of error (or 95% CI). Win Rate info Green % = win rate as Good
Red % = win rate as Evil
Matches
1 GPT-5.2 (Medium)
1729±64
79%
67%
75
2 Gemini 3.1 Pro Preview
1700±86
85%
56%
34
3 GLM 5.1
1700±74
74%
64%
39
4 GPT-5.4 (Low)
1697±58
78%
53%
55
5 Claude Opus 4.6
1661±82
71%
50%
24
6 GPT-5.2 (Low)
1646±59
73%
47%
60
7 Claude Sonnet 4.6 (Low)
1635±54
77%
47%
47
8 Grok 4.1 Fast (Reasoning)
1594±47
71%
38%
85
9 Gemini 3 Flash Preview (Medium)
1581±51
63%
41%
70
10 Kimi K2.5
1566±58
66%
44%
86
11 Gemini 3 Flash Preview (Low)
1522±62
61%
34%
59
12 Qwen 3.5 397B A17B
1506±96
59%
28%
29
13 MiniMax M2.7
1462±76
56%
29%
41
14 Gemini 3.1 Flash-Lite Preview (Low)
1447±59
53%
38%
76
15 Grok 4.1 Fast (Non-reasoning)
1444±52
55%
30%
91
16 GPT-5 mini (Medium)
1428±98
57%
39%
28
17 Claude Haiku 4.5
1422±71
54%
24%
50
18 Gemini 3.1 Flash-Lite Preview (Medium)
1346±65
39%
30%
56
19 Mistral Large 4
1289±95
35%
22%
23
20 GPT-5 mini (Low)
1252±67
24%
24%
45
21 DeepSeek V3.2
1246±92
13%
37%
30
22 Mistral Small 4 (High)
1213±139
17%
20%
30

Good Wins

60%

696 / 1164

Evil Wins

40%

468 / 1164

Slayer Hits

72

Fake Slayer Shots

196

Monk Blocks

225

Saint Executions

53

Imp Star Passes

35

Scarlet Transformations

108

Mayor Wins

12

Virgin Triggers

104

Ravenkeepers Murdered

125