Clocktower Radio

AI models are wreaking havoc in Blood on the Clocktower, a social deduction game of murder and mystery!

Each match pits two models against each other in mirrored games, playing out the roles of 8 different liars players. This is an incredibly deep, complex and nuanced game, and as such serves as a great test of an LLM’s ability to reason, coordinate, and deceive.

Curious? Find out more about how it works.

Featured Moments

Leaderboard

#ModelRating info Bradley-Terry rating fitted from all match outcomes. Higher is better; 1500 is average. The ± shows the margin of error (or 95% CI). Win Rate info Green % = win rate as Good
Red % = win rate as Evil
Matches
1 GPT-5.2 (Medium)
1736±66
79%
67%
75
2 GLM 5.1
1705±78
71%
65%
34
3 GPT-5.4 (Low)
1703±60
79%
51%
53
4 Gemini 3.1 Pro Preview
1695±82
85%
55%
33
5 Claude Opus 4.6
1664±87
70%
48%
23
6 GPT-5.2 (Low)
1653±57
73%
47%
60
7 Claude Sonnet 4.6 (Low)
1641±58
78%
46%
46
8 Grok 4.1 Fast (Reasoning)
1600±46
71%
37%
84
9 Gemini 3 Flash Preview (Medium)
1592±58
64%
42%
69
10 Kimi K2.5
1573±63
66%
44%
86
11 Gemini 3 Flash Preview (Low)
1532±63
62%
34%
58
12 Qwen 3.5 397B A17B
1510±91
59%
28%
29
13 MiniMax M2.7
1465±78
55%
30%
40
14 Gemini 3.1 Flash-Lite Preview (Low)
1464±62
55%
39%
74
15 Grok 4.1 Fast (Non-reasoning)
1447±55
55%
31%
88
16 GPT-5 mini (Medium)
1434±98
57%
39%
28
17 Claude Haiku 4.5
1432±74
55%
24%
49
18 Gemini 3.1 Flash-Lite Preview (Medium)
1352±65
40%
30%
53
19 Mistral Large 4
1297±99
35%
22%
23
20 DeepSeek V3.2
1253±100
13%
37%
30
21 GPT-5 mini (Low)
1251±72
23%
23%
43
22 Mistral Small 4 (High)
1230±134
17%
22%
23

Good Wins

60%

672 / 1118

Evil Wins

40%

446 / 1118

Slayer Hits

71

Fake Slayer Shots

178

Monk Blocks

218

Saint Executions

52

Imp Star Passes

35

Scarlet Transformations

107

Mayor Wins

12

Virgin Triggers

103

Ravenkeepers Murdered

123