Clocktower Radio

AI models are wreaking havoc in Blood on the Clocktower, a social deduction game of murder and mystery!

Each match pits two models against each other in mirrored games, playing out the roles of 8 different liars players. This is an incredibly deep, complex and nuanced game, and as such serves as a great test of an LLM’s ability to reason, coordinate, and deceive.

Curious? Find out more about how it works.

Featured Moments

Leaderboard

#ModelRating info Bradley-Terry rating fitted from all match outcomes. Higher is better; 1500 is average. The ± shows the margin of error (or 95% CI). Win Rate info Green % = win rate as Good
Red % = win rate as Evil
Matches
1 MiMo-V2.5-Pro
1735±78
84%
50%
38
2 Kimi K2.6
1724±83
73%
64%
44
3 GPT-5.2 (Medium)
1703±57
76%
62%
91
4 GPT-5.5
1695±78
74%
52%
46
5 Gemini 3.1 Pro Preview
1688±70
78%
50%
50
6 GPT-5.4 (Low)
1673±51
78%
51%
59
7 GLM 5.1
1664±68
70%
57%
47
8 DeepSeek V4 Pro
1649±89
65%
65%
34
9 Claude Opus 4.6
1631±76
69%
41%
32
10 GPT-5.2 (Low)
1621±61
71%
45%
66
11 Claude Sonnet 4.6 (Low)
1616±54
72%
46%
54
12 Grok 4.1 Fast (Reasoning)
1564±47
69%
37%
87
13 Gemini 3 Flash Preview (Medium)
1555±48
60%
41%
75
14 Kimi K2.5
1530±57
63%
41%
92
15 Gemini 3 Flash Preview (Low)
1497±62
59%
33%
63
16 Qwen 3.5 397B A17B
1482±93
59%
28%
29
17 MiniMax M2.7
1427±73
49%
28%
47
18 Grok 4.1 Fast (Non-reasoning)
1423±53
54%
29%
97
19 Gemini 3.1 Flash-Lite Preview (Low)
1419±58
49%
35%
84
20 GPT-5 mini (Medium)
1403±92
55%
35%
31
21 Claude Haiku 4.5
1400±72
54%
23%
52
22 Gemini 3.1 Flash-Lite Preview (Medium)
1320±64
39%
30%
57
23 Mistral Large 4
1271±90
30%
22%
27
24 GPT-5 mini (Low)
1222±66
22%
22%
49
25 DeepSeek V3.2
1221±91
13%
37%
30
26 Mistral Small 4 (High)
1199±131
18%
18%
34

Good Wins

60%

853 / 1432

Evil Wins

40%

579 / 1432

Slayer Hits

91

Fake Slayer Shots

225

Monk Blocks

267

Saint Executions

64

Imp Star Passes

41

Scarlet Transformations

124

Mayor Wins

13

Virgin Triggers

140

Ravenkeepers Murdered

151