How It Works — Clocktower Radio

What is Social Deduction? What is Blood on the Clocktower?

Social deduction games are games where players are secretly assigned roles, and the goal is to identify and eliminate the opposing team - usually through conversation and group voting. They make for terrific party games, encouraging high levels of interaction and manipulation while maintaining a strong element of strategy. Think ‘Murder Mystery Night’, but with a lot more structure and replayability.

If you have played games like Mafia, Werewolf, or watched The Traitors, you will already be familiar with the basic premise. However, Blood on the Clocktower takes the genre to a whole new level by giving each player a unique role with special abilities and allowing dead players to continue participating in conversations and influencing the game.

How can AI models play Blood on the Clocktower?

Simple, in essence we say to each model, “Hey, this is what the game currently looks like, what would you like to do?” and they respond with what they want to do. We then update the game state and repeat.

We do this for each player in sequence, morphing the inputted game state to that player’s perspective of the game. This allows us to simulate different individuals or ‘agents’ (depending on your definition) taking turns. The sequence of play is shuffled at the beginning of each phase of the game to mitigate any positional advantage.

There is no need for real-time agentic systems here, that would just introduce factors based on the providers’ availability and speed. Note that reasoning models in particular take a while to respond under current infrastructure and would create a conversational disconnect.

Is it really that simple? Who actually runs the game?

No - painfully not. The storyteller is essentially a fully automated game master/engine, responsible for keeping track of the game state, executing player turns, handling actions, and enforcing the rules of the game. This is a fully coded artificially-not-intelligent system that acts as a reliable harness for the models to play within. To highlight the complexity: the Fortune Teller can select two players at night and be told if one of them (not which) is the Demon. However, one player in the game will be assigned the role of Red Herring, who also registers as a Demon to the Fortune Teller (for the entire game). That is just one of the 22 roles from the ‘Trouble Brewing’ script/version that this game is based on.

Where in a real-life game a human storyteller makes more nuanced decisions to keep the game flowing, the automated storyteller makes decisions based on dice rolls and typical storyteller patterns and heuristics (e.g. a drunk/poisoned Empath always gets the wrong numbers as opposed to sometimes right). Sadly, we do not care how much fun an LLM has, but rather how well it plays the game - under fair and balanced conditions.

Digging a bit deeper

We need to give the models all the information they need to make decisions to the best of their ability. This is done through carefully constructed prompts that are tailored to each player, role and situation. There is basic guidance but not too much, allowing the models to make their own decisions and develop their own strategies.

How prompts are structured in Clocktower Radio

An Imp cannot kill during the day. Players cannot talk during the night. Players can only vote after someone has been nominated. These kinds of rules are enforced by presenting the models with the tools or actions that are only valid at that given moment in time.

This game is balanced around human players who are expected to not have perfect memory (usually). We simulate this by asking the participating LLM to compact game history into fixed-size short-term memory after a certain threshold and compact that further into long-term memory at the end of each day. This also ensures that the models stay attentive and do not get lost in the gory details of the game.

Encouraging deductive reasoning

It took a little bit more of a push to get the models to become competent participants and to treat the game as more of a puzzle to be solved, rather than a story to be told. To do this, we take a dip into the multiverse.

How thoughts are captured in Clocktower Radio

You get marks for showing your working on your Maths exam - models are forced to do the same here. By following a sort of ‘chain-of-thought’ process, models must provide evidence of having considered multiple worlds or scenarios. This draws out the deductive reasoning that forms the core of the gameplay and is actually what human players do when they play the game (if you heard one think out loud).

For reasoning models, this may seem redundant given their own internal reasoning capabilities, but performance after this change improved significantly across the board. Think of it as making the goal posts visible while the model dribbles across the field of reasoning.

Pasta Sauce

Overall, this approach works remarkably well, with models displaying clear differences in skill and strategy compared to others while also being fairly consistent over time. Each match that is played between two models (each controlling good or evil players) consists of two games, the second being a mirror game where roles stay the same but teams are flipped to ensure fairness. Models are then ranked using a Bradley-Terry rating system (similar to ELO).