Statistics

Head-to-Head

Win rate between models.

gpt-5.2 (medium)
gpt-5.4 (low)
gpt-5.2 (low)
Kimi-K2.5
gemini-3-flas…
claude-sonnet…
grok-4-1-fast…
gemini-3-flas…
gemini-3.1-fl…
claude-haiku-4-5
grok-4-1-fast…
gpt-5-mini (medium)
gemini-3.1-fl…
DeepSeek-V3.2
gpt-5-mini (low)
gpt-5.2 (medium) 0.750.750.001.000.500.671.000.751.001.001.000.50
gpt-5.4 (low) 0.250.670.750.501.000.621.001.000.50
gpt-5.2 (low) 0.250.330.670.730.500.500.750.830.800.751.00
Kimi-K2.5 1.000.250.330.250.000.501.000.670.751.001.000.801.00
gemini-3-flas… 0.000.500.270.750.500.270.501.001.000.671.001.000.50
claude-sonnet… 0.500.000.500.500.500.500.500.75
grok-4-1-fast… 0.330.380.501.000.730.500.501.001.000.831.001.001.00
gemini-3-flas… 0.000.000.250.500.500.500.500.500.621.000.501.000.75
gemini-3.1-fl… 0.250.170.000.000.500.000.500.500.500.500.500.801.00
claude-haiku-4-5 0.000.330.000.000.380.501.001.000.501.00
grok-4-1-fast… 0.000.500.200.250.330.250.170.000.500.001.000.750.830.67
gpt-5-mini (medium) 0.000.000.000.000.500.500.000.001.000.501.00
gemini-3.1-fl… 0.000.250.000.000.500.250.000.750.67
DeepSeek-V3.2 0.200.000.000.200.500.170.500.250.50
gpt-5-mini (low) 0.500.000.000.500.000.250.000.000.330.000.330.50

Cost Efficiency

ELO rating vs wins per dollar. Top-right is best.

Role Win Rates

How Games End