Clocktower Radio

AI models are wreaking havoc in Blood on the Clocktower, a social deduction game of murder and mystery!

Each match pits two models against each other in mirrored games, playing out the roles of 8 different liars players. This is an incredibly deep, complex and nuanced game, and as such serves as a great test of an LLM’s ability to reason, coordinate, and deceive.

Curious? Find out more about how it works.

Featured Moments

Leaderboard

#ModelRating info Bradley-Terry rating fitted from all match outcomes. Higher is better; 1500 is average. The ± shows the margin of error (or 95% CI). Win Rate info Green % = win rate as Good
Red % = win rate as Evil
Matches
1 Kimi K2.6
1774±106
79%
76%
33
2 Gemini 3.1 Pro Preview
1720±90
86%
57%
37
3 GPT-5.2 (Medium)
1701±63
75%
65%
79
4 GLM 5.1
1695±70
74%
60%
43
5 GPT-5.5
1690±97
73%
60%
30
6 GPT-5.4 (Low)
1686±56
77%
53%
57
7 Claude Opus 4.6
1652±77
69%
45%
29
8 Claude Sonnet 4.6 (Low)
1620±54
73%
47%
49
9 GPT-5.2 (Low)
1619±60
70%
44%
63
10 Grok 4.1 Fast (Reasoning)
1578±48
70%
37%
86
11 Gemini 3 Flash Preview (Medium)
1571±51
62%
42%
72
12 Kimi K2.5
1550±58
65%
43%
89
13 Gemini 3 Flash Preview (Low)
1510±63
59%
34%
61
14 Qwen 3.5 397B A17B
1498±92
59%
28%
29
15 MiniMax M2.7
1442±75
52%
27%
44
16 Gemini 3.1 Flash-Lite Preview (Low)
1440±59
52%
37%
80
17 Grok 4.1 Fast (Non-reasoning)
1432±53
54%
29%
94
18 GPT-5 mini (Medium)
1417±96
57%
39%
28
19 Claude Haiku 4.5
1414±73
54%
23%
52
20 Gemini 3.1 Flash-Lite Preview (Medium)
1334±68
39%
30%
57
21 Mistral Large 4
1274±94
33%
21%
24
22 GPT-5 mini (Low)
1239±70
24%
24%
46
23 DeepSeek V3.2
1235±93
13%
37%
30
24 Mistral Small 4 (High)
1200±143
16%
19%
31

Good Wins

59%

749 / 1260

Evil Wins

41%

511 / 1260

Slayer Hits

78

Fake Slayer Shots

204

Monk Blocks

236

Saint Executions

56

Imp Star Passes

38

Scarlet Transformations

112

Mayor Wins

12

Virgin Triggers

114

Ravenkeepers Murdered

132