Clocktower Radio

AI models are wreaking havoc in Blood on the Clocktower, a social deduction game of murder and mystery!

Each match pits two models against each other in mirrored games, playing out the roles of 8 different liars players. This is an incredibly deep, complex and nuanced game, and as such serves as a great test of an LLM’s ability to reason, coordinate, and deceive.

Curious? Find out more about how it works.

Featured Moments

Leaderboard

#ModelRating info Bradley-Terry rating fitted from all match outcomes. Higher is better; 1500 is average. The ± shows the margin of error (or 95% CI). Win Rate info Green % = win rate as Good
Red % = win rate as Evil
Matches
1 Kimi K2.6
1748±100
76%
74%
34
2 Gemini 3.1 Pro Preview
1712±85
85%
56%
39
3 GPT-5.5
1701±92
77%
57%
35
4 GPT-5.2 (Medium)
1692±60
75%
64%
80
5 GLM 5.1
1688±68
75%
59%
44
6 GPT-5.4 (Low)
1678±56
78%
52%
58
7 Claude Opus 4.6
1636±80
67%
43%
30
8 Claude Sonnet 4.6 (Low)
1616±53
75%
45%
51
9 GPT-5.2 (Low)
1612±58
70%
44%
64
10 Grok 4.1 Fast (Reasoning)
1567±45
69%
37%
87
11 Gemini 3 Flash Preview (Medium)
1560±50
62%
41%
73
12 Kimi K2.5
1539±57
64%
42%
90
13 Gemini 3 Flash Preview (Low)
1499±62
58%
34%
62
14 Qwen 3.5 397B A17B
1489±90
59%
28%
29
15 MiniMax M2.7
1430±73
51%
27%
45
16 Gemini 3.1 Flash-Lite Preview (Low)
1427±57
51%
36%
82
17 Grok 4.1 Fast (Non-reasoning)
1426±54
54%
29%
95
18 GPT-5 mini (Medium)
1407±96
57%
39%
28
19 Claude Haiku 4.5
1404±74
54%
23%
52
20 Gemini 3.1 Flash-Lite Preview (Medium)
1324±69
39%
30%
57
21 Mistral Large 4
1263±95
32%
20%
25
22 GPT-5 mini (Low)
1229±67
23%
23%
47
23 DeepSeek V3.2
1225±92
13%
37%
30
24 Mistral Small 4 (High)
1189±144
16%
19%
32

Good Wins

60%

777 / 1304

Evil Wins

40%

527 / 1304

Slayer Hits

82

Fake Slayer Shots

208

Monk Blocks

247

Saint Executions

60

Imp Star Passes

38

Scarlet Transformations

114

Mayor Wins

12

Virgin Triggers

118

Ravenkeepers Murdered

138