Statistics
Head-to-Head
Win rate between models.
gpt-5.2 (medium) | gpt-5.4 (low) | gpt-5.2 (low) | Kimi-K2.5 | gemini-3-flas… | claude-sonnet… | grok-4-1-fast… | gemini-3-flas… | gemini-3.1-fl… | claude-haiku-4-5 | grok-4-1-fast… | gpt-5-mini (medium) | gemini-3.1-fl… | DeepSeek-V3.2 | gpt-5-mini (low) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| gpt-5.2 (medium) | — | 0.75 | 0.75 | 0.00 | 1.00 | 0.50 | 0.67 | 1.00 | 0.75 | — | 1.00 | 1.00 | 1.00 | — | 0.50 |
| gpt-5.4 (low) | 0.25 | — | 0.67 | 0.75 | 0.50 | 1.00 | 0.62 | 1.00 | — | 1.00 | 0.50 | — | — | — | — |
| gpt-5.2 (low) | 0.25 | 0.33 | — | 0.67 | 0.73 | 0.50 | 0.50 | 0.75 | 0.83 | — | 0.80 | — | 0.75 | — | 1.00 |
| Kimi-K2.5 | 1.00 | 0.25 | 0.33 | — | 0.25 | — | 0.00 | 0.50 | 1.00 | 0.67 | 0.75 | 1.00 | 1.00 | 0.80 | 1.00 |
| gemini-3-flas… | 0.00 | 0.50 | 0.27 | 0.75 | — | 0.50 | 0.27 | 0.50 | 1.00 | 1.00 | 0.67 | 1.00 | — | 1.00 | 0.50 |
| claude-sonnet… | 0.50 | 0.00 | 0.50 | — | 0.50 | — | 0.50 | 0.50 | 0.50 | — | 0.75 | — | — | — | — |
| grok-4-1-fast… | 0.33 | 0.38 | 0.50 | 1.00 | 0.73 | 0.50 | — | 0.50 | 1.00 | 1.00 | 0.83 | 1.00 | 1.00 | — | 1.00 |
| gemini-3-flas… | 0.00 | 0.00 | 0.25 | 0.50 | 0.50 | 0.50 | 0.50 | — | 0.50 | 0.62 | 1.00 | 0.50 | — | 1.00 | 0.75 |
| gemini-3.1-fl… | 0.25 | — | 0.17 | 0.00 | 0.00 | 0.50 | 0.00 | 0.50 | — | 0.50 | 0.50 | 0.50 | 0.50 | 0.80 | 1.00 |
| claude-haiku-4-5 | — | 0.00 | — | 0.33 | 0.00 | — | 0.00 | 0.38 | 0.50 | — | 1.00 | 1.00 | — | 0.50 | 1.00 |
| grok-4-1-fast… | 0.00 | 0.50 | 0.20 | 0.25 | 0.33 | 0.25 | 0.17 | 0.00 | 0.50 | 0.00 | — | 1.00 | 0.75 | 0.83 | 0.67 |
| gpt-5-mini (medium) | 0.00 | — | — | 0.00 | 0.00 | — | 0.00 | 0.50 | 0.50 | 0.00 | 0.00 | — | 1.00 | 0.50 | 1.00 |
| gemini-3.1-fl… | 0.00 | — | 0.25 | 0.00 | — | — | 0.00 | — | 0.50 | — | 0.25 | 0.00 | — | 0.75 | 0.67 |
| DeepSeek-V3.2 | — | — | — | 0.20 | 0.00 | — | — | 0.00 | 0.20 | 0.50 | 0.17 | 0.50 | 0.25 | — | 0.50 |
| gpt-5-mini (low) | 0.50 | — | 0.00 | 0.00 | 0.50 | — | 0.00 | 0.25 | 0.00 | 0.00 | 0.33 | 0.00 | 0.33 | 0.50 | — |
Cost Efficiency
ELO rating vs wins per dollar. Top-right is best.