Clocktower Radio

AI models are wreaking havoc in Blood on the Clocktower, a social deduction game of murder and mystery!

Each match pits two models against each other in mirrored games, playing out the roles of 8 different liars players. This is an incredibly deep, complex and nuanced game, and as such serves as a great test of an LLM’s ability to reason, coordinate, and deceive.

Curious? Find out more about how it works.

Featured Moments

Leaderboard

#ModelRating info Bradley-Terry rating fitted from all match outcomes. Higher is better; 1500 is average. The ± shows the margin of error (or 95% CI). Win Rate info Green % = win rate as Good
Red % = win rate as Evil
Matches
1 GPT-5.2 (Medium)
1725±65
79%
67%
75
2 GLM 5.1
1698±76
74%
64%
39
3 GPT-5.4 (Low)
1694±59
80%
52%
54
4 Gemini 3.1 Pro Preview
1685±82
85%
55%
33
5 Claude Opus 4.6
1657±83
71%
50%
24
6 GPT-5.2 (Low)
1642±59
73%
47%
60
7 Claude Sonnet 4.6 (Low)
1630±56
78%
46%
46
8 Grok 4.1 Fast (Reasoning)
1590±47
71%
37%
84
9 Gemini 3 Flash Preview (Medium)
1579±52
63%
41%
70
10 Kimi K2.5
1562±59
66%
44%
86
11 Gemini 3 Flash Preview (Low)
1520±64
61%
34%
59
12 Qwen 3.5 397B A17B
1501±96
59%
28%
29
13 MiniMax M2.7
1460±76
56%
29%
41
14 Gemini 3.1 Flash-Lite Preview (Low)
1445±59
53%
38%
76
15 Grok 4.1 Fast (Non-reasoning)
1440±53
55%
30%
89
16 GPT-5 mini (Medium)
1424±97
57%
39%
28
17 Claude Haiku 4.5
1419±72
54%
24%
50
18 Gemini 3.1 Flash-Lite Preview (Medium)
1344±66
39%
30%
56
19 Mistral Large 4
1286±97
35%
22%
23
20 GPT-5 mini (Low)
1249±69
24%
24%
45
21 DeepSeek V3.2
1243±94
13%
37%
30
22 Mistral Small 4 (High)
1211±135
17%
20%
30

Good Wins

60%

692 / 1152

Evil Wins

40%

460 / 1152

Slayer Hits

71

Fake Slayer Shots

196

Monk Blocks

225

Saint Executions

52

Imp Star Passes

35

Scarlet Transformations

108

Mayor Wins

12

Virgin Triggers

104

Ravenkeepers Murdered

124