WorldCupForecastBench 2026
Analytics
Compare model performance across horizons, access conditions, prompt strategies, stages, and reliability metrics.
Filters
Benchmark slice
Ranked leaderboard
Scores
Match order
Performance over time
Claude Fable 5 / Closed Book / Direct Score / STAGE_OPENINGClaude Fable 5 / Closed Book / Probabilistic Forecast / STAGE_OPENINGClaude Fable 5 / Open Book / Direct Score / STAGE_OPENINGClaude Fable 5 / Open Book / Probabilistic Forecast / STAGE_OPENINGClaude Fable 5 / Closed Book / Direct Score / T_24HClaude Fable 5 / Closed Book / Probabilistic Forecast / T_24HClaude Fable 5 / Open Book / Direct Score / T_24HClaude Fable 5 / Open Book / Probabilistic Forecast / T_24H
Detailed leaderboard
Model configurations
| Rank | Model | Provider | Horizon | Access | Prompt | Scored | Points | Brier | Log loss | Top acc. | Exact | GD acc. | Total-goals err. | Invalid | Repair | Search | Selected metric |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| #1 | Grok 4.3i | xAI | T_24H | Closed Book | Direct Score | 5 | 16 | 0.519 | 0.881 | 60% | 60% | 60% | 0.600 | 0% | 0% | - | 16 |
| #2 | Grok 4.3i | xAI | T_2H | Closed Book | Direct Score | 5 | 15 | 0.524 | 0.890 | 60% | 60% | 60% | 0.600 | 0% | 0% | - | 15 |
| #3 | Claude Fable 5i | Anthropic | STAGE_OPENING | Closed Book | Direct Score | 5 | 11 | 0.646 | 1.045 | 40% | 40% | 40% | 1.000 | 0% | 0% | - | 11 |
| #3 | Claude Fable 5i | Anthropic | T_24H | Closed Book | Direct Score | 5 | 11 | 0.663 | 1.071 | 40% | 40% | 40% | 1.000 | 30% | 0% | - | 11 |
| #3 | DeepSeek V4 Proi | DeepSeek | STAGE_OPENING | Closed Book | Direct Score | 5 | 11 | 0.570 | 0.945 | 60% | 40% | 40% | 1.000 | 0% | 0% | - | 11 |
| #3 | DeepSeek V4 Proi | DeepSeek | T_2H | Closed Book | Direct Score | 5 | 11 | 0.569 | 0.935 | 60% | 40% | 40% | 0.600 | 0% | 0% | - | 11 |
| #3 | Gemini 3.1 Proi | STAGE_OPENING | Closed Book | Direct Score | 5 | 11 | 0.604 | 0.980 | 60% | 40% | 40% | 1.000 | 0% | 0% | - | 11 | |
| #3 | Gemini 3.1 Proi | T_24H | Closed Book | Direct Score | 5 | 11 | 0.597 | 0.972 | 60% | 40% | 40% | 1.000 | 0% | 0% | - | 11 | |
| #3 | Gemini 3.1 Proi | T_2H | Closed Book | Direct Score | 5 | 11 | 0.612 | 0.997 | 60% | 40% | 40% | 1.000 | 0% | 0% | - | 11 | |
| #3 | Gemini 3.1 Proi | T_24H | Closed Book | Probabilistic Forecast | 5 | 11 | 0.589 | 0.962 | 60% | 40% | 40% | 1.000 | 0% | 0% | - | 11 | |
| #3 | Gemini 3.1 Proi | T_2H | Closed Book | Probabilistic Forecast | 5 | 11 | 0.614 | 0.996 | 60% | 40% | 40% | 1.000 | 0% | 0% | - | 11 | |
| #3 | Grok 4.3i | xAI | STAGE_OPENING | Closed Book | Direct Score | 5 | 11 | 0.532 | 0.902 | 60% | 40% | 40% | 0.800 | 0% | 0% | - | 11 |
| #3 | Grok 4.3i | xAI | STAGE_OPENING | Closed Book | Probabilistic Forecast | 5 | 11 | 0.537 | 0.912 | 60% | 40% | 40% | 1.000 | 0% | 0% | - | 11 |
| #3 | Grok 4.3i | xAI | T_24H | Open Book | Probabilistic Forecast | 5 | 11 | 0.532 | 0.865 | 60% | 40% | 40% | 1.000 | 0% | 0% | 100% | 11 |
| #3 | Grok 4.3i | xAI | T_2H | Open Book | Probabilistic Forecast | 5 | 11 | 0.531 | 0.872 | 60% | 40% | 40% | 0.600 | 0% | 0% | 100% | 11 |
| #3 | Mistral Large 2512i | Mistral | STAGE_OPENING | Open Book | Probabilistic Forecast | 5 | 11 | 0.586 | 0.963 | 60% | 40% | 40% | 1.000 | 0% | 3% | 100% | 11 |
| #17 | Claude Fable 5i | Anthropic | T_24H | Closed Book | Probabilistic Forecast | 5 | 10 | 0.641 | 1.037 | 40% | 40% | 40% | 1.000 | 30% | 0% | - | 10 |
| #18 | Mistral Large 2512i | Mistral | T_2H | Open Book | Direct Score | 5 | 9 | 0.578 | 0.950 | 60% | 20% | 40% | 1.000 | 0% | 0% | 100% | 9 |
| #18 | Mistral Large 2512i | Mistral | T_2H | Open Book | Probabilistic Forecast | 5 | 9 | 0.555 | 0.916 | 60% | 20% | 40% | 1.000 | 0% | 17% | 100% | 9 |
| #20 | Claude Fable 5i | Anthropic | T_24H | Open Book | Direct Score | 5 | 8 | 0.644 | 1.025 | 60% | 20% | 40% | 1.400 | 30% | 0% | 20% | 8 |
| #20 | Grok 4.3i | xAI | T_2H | Open Book | Direct Score | 5 | 8 | 0.521 | 0.860 | 60% | 20% | 40% | 1.000 | 0% | 0% | 100% | 8 |
| #20 | Mistral Large 2512i | Mistral | T_24H | Open Book | Direct Score | 5 | 8 | 0.599 | 0.982 | 60% | 20% | 40% | 1.400 | 0% | 0% | 100% | 8 |
| #23 | Claude Fable 5i | Anthropic | STAGE_OPENING | Closed Book | Probabilistic Forecast | 5 | 7 | 0.653 | 1.056 | 40% | 20% | 20% | 1.400 | 0% | 1% | - | 7 |
| #23 | Claude Fable 5i | Anthropic | T_2H | Closed Book | Probabilistic Forecast | 4 | 7 | 0.548 | 0.934 | 50% | 25% | 25% | 1.500 | 33% | 0% | - | 7 |
| #23 | DeepSeek V4 Proi | DeepSeek | T_24H | Closed Book | Direct Score | 5 | 7 | 0.593 | 0.966 | 80% | 20% | 20% | 1.200 | 0% | 0% | - | 7 |
| #23 | DeepSeek V4 Proi | DeepSeek | STAGE_OPENING | Closed Book | Probabilistic Forecast | 5 | 7 | 0.601 | 0.972 | 40% | 20% | 20% | 1.000 | 0% | 0% | - | 7 |
| #23 | DeepSeek V4 Proi | DeepSeek | T_2H | Closed Book | Probabilistic Forecast | 5 | 7 | 0.576 | 0.945 | 60% | 20% | 20% | 1.000 | 0% | 0% | - | 7 |
| #23 | DeepSeek V4 Proi | DeepSeek | STAGE_OPENING | Open Book | Direct Score | 5 | 7 | 0.615 | 0.998 | 60% | 20% | 20% | 1.200 | 0% | 1% | 100% | 7 |
| #23 | DeepSeek V4 Proi | DeepSeek | T_24H | Open Book | Probabilistic Forecast | 5 | 7 | 0.652 | 1.044 | 60% | 20% | 40% | 1.200 | 0% | 0% | 100% | 7 |
| #23 | GPT-5.5i | OpenAI | STAGE_OPENING | Closed Book | Direct Score | 5 | 7 | 0.611 | 1.000 | 40% | 20% | 20% | 1.400 | 0% | 0% | - | 7 |
| #23 | GPT-5.5i | OpenAI | T_24H | Closed Book | Direct Score | 5 | 7 | 0.621 | 1.025 | 40% | 20% | 20% | 1.400 | 0% | 0% | - | 7 |
| #23 | GPT-5.5i | OpenAI | T_2H | Closed Book | Direct Score | 5 | 7 | 0.604 | 0.998 | 40% | 20% | 20% | 1.400 | 0% | 0% | - | 7 |
| #23 | GPT-5.5i | OpenAI | STAGE_OPENING | Closed Book | Probabilistic Forecast | 5 | 7 | 0.622 | 1.021 | 40% | 20% | 20% | 1.400 | 0% | 0% | - | 7 |
| #23 | GPT-5.5i | OpenAI | T_24H | Closed Book | Probabilistic Forecast | 5 | 7 | 0.633 | 1.033 | 40% | 20% | 20% | 1.400 | 0% | 0% | - | 7 |
| #23 | GPT-5.5i | OpenAI | T_2H | Closed Book | Probabilistic Forecast | 5 | 7 | 0.625 | 1.027 | 40% | 20% | 20% | 1.400 | 0% | 0% | - | 7 |
| #23 | Grok 4.3i | xAI | T_24H | Open Book | Direct Score | 5 | 7 | 0.527 | 0.866 | 60% | 20% | 20% | 1.000 | 0% | 0% | 100% | 7 |
| #23 | Mistral Large 2512i | Mistral | STAGE_OPENING | Open Book | Direct Score | 5 | 7 | 0.563 | 0.937 | 60% | 20% | 20% | 1.200 | 0% | 3% | 100% | 7 |
| #38 | Claude Fable 5i | Anthropic | T_2H | Closed Book | Direct Score | 4 | 6 | 0.539 | 0.918 | 50% | 25% | 25% | 1.500 | 33% | 0% | - | 6 |
| #38 | Claude Fable 5i | Anthropic | T_24H | Open Book | Probabilistic Forecast | 5 | 6 | 0.656 | 1.041 | 60% | 20% | 20% | 1.200 | 30% | 0% | 20% | 6 |
| #38 | Claude Opus 4.8i | Anthropic | STAGE_OPENING | Closed Book | Direct Score | 5 | 6 | 0.648 | 1.049 | 40% | 20% | 20% | 1.000 | 0% | 0% | - | 6 |
| #38 | Claude Opus 4.8i | Anthropic | T_24H | Closed Book | Direct Score | 5 | 6 | 0.648 | 1.049 | 40% | 20% | 20% | 1.000 | 0% | 0% | - | 6 |
| #38 | Claude Opus 4.8i | Anthropic | T_2H | Closed Book | Direct Score | 5 | 6 | 0.652 | 1.055 | 40% | 20% | 20% | 1.000 | 0% | 0% | - | 6 |
| #38 | Claude Opus 4.8i | Anthropic | STAGE_OPENING | Closed Book | Probabilistic Forecast | 5 | 6 | 0.648 | 1.049 | 40% | 20% | 20% | 1.000 | 0% | 0% | - | 6 |
| #38 | Claude Opus 4.8i | Anthropic | T_24H | Closed Book | Probabilistic Forecast | 5 | 6 | 0.652 | 1.056 | 40% | 20% | 20% | 1.000 | 0% | 0% | - | 6 |
| #38 | Claude Opus 4.8i | Anthropic | T_2H | Closed Book | Probabilistic Forecast | 5 | 6 | 0.657 | 1.062 | 40% | 20% | 20% | 1.000 | 0% | 0% | - | 6 |
| #38 | Claude Opus 4.8i | Anthropic | STAGE_OPENING | Open Book | Direct Score | 5 | 6 | 0.621 | 1.000 | 60% | 20% | 20% | 1.200 | 0% | 0% | 88% | 6 |
| #38 | DeepSeek V4 Proi | DeepSeek | T_24H | Open Book | Direct Score | 5 | 6 | 0.667 | 1.070 | 60% | 20% | 20% | 0.800 | 0% | 10% | 100% | 6 |
| #38 | DeepSeek V4 Proi | DeepSeek | T_2H | Open Book | Direct Score | 5 | 6 | 0.661 | 1.058 | 60% | 20% | 20% | 0.800 | 0% | 0% | 100% | 6 |
| #38 | DeepSeek V4 Proi | DeepSeek | STAGE_OPENING | Open Book | Probabilistic Forecast | 5 | 6 | 0.652 | 1.049 | 60% | 20% | 20% | 1.200 | 0% | 8% | 99% | 6 |
| #38 | DeepSeek V4 Proi | DeepSeek | T_2H | Open Book | Probabilistic Forecast | 5 | 6 | 0.670 | 1.070 | 60% | 20% | 20% | 0.800 | 0% | 0% | 100% | 6 |
| #38 | Gemini 3.1 Proi | STAGE_OPENING | Closed Book | Probabilistic Forecast | 5 | 6 | 0.588 | 0.962 | 60% | 20% | 20% | 1.200 | 0% | 0% | - | 6 | |
| #38 | Gemini 3.1 Proi | STAGE_OPENING | Open Book | Direct Score | 5 | 6 | 0.665 | 1.063 | 60% | 20% | 20% | 1.200 | 0% | 0% | 35% | 6 | |
| #38 | Gemini 3.1 Proi | T_24H | Open Book | Direct Score | 5 | 6 | 0.663 | 1.050 | 60% | 20% | 20% | 1.200 | 0% | 0% | 30% | 6 | |
| #38 | Gemini 3.1 Proi | STAGE_OPENING | Open Book | Probabilistic Forecast | 5 | 6 | 0.667 | 1.066 | 60% | 20% | 20% | 1.200 | 0% | 0% | 36% | 6 | |
| #38 | Gemini 3.1 Proi | T_24H | Open Book | Probabilistic Forecast | 5 | 6 | 0.667 | 1.057 | 60% | 20% | 20% | 1.200 | 0% | 0% | 30% | 6 | |
| #38 | Grok 4.3i | xAI | T_24H | Closed Book | Probabilistic Forecast | 5 | 6 | 0.562 | 0.938 | 60% | 20% | 20% | 1.200 | 0% | 0% | - | 6 |
| #38 | Grok 4.3i | xAI | T_2H | Closed Book | Probabilistic Forecast | 5 | 6 | 0.547 | 0.923 | 60% | 20% | 20% | 1.200 | 0% | 0% | - | 6 |
| #38 | Grok 4.3i | xAI | STAGE_OPENING | Open Book | Direct Score | 5 | 6 | 0.570 | 0.927 | 60% | 20% | 20% | 1.000 | 0% | 0% | 100% | 6 |
| #38 | Qwen 3.7 Maxi | Qwen | STAGE_OPENING | Open Book | Direct Score | 5 | 6 | 0.540 | 0.905 | 60% | 20% | 20% | 1.200 | 0% | 0% | 97% | 6 |
| #38 | Qwen 3.7 Maxi | Qwen | T_2H | Open Book | Direct Score | 5 | 6 | 0.603 | 0.971 | 60% | 20% | 20% | 1.400 | 0% | 0% | 100% | 6 |
| #61 | Mistral Large 2512i | Mistral | STAGE_OPENING | Closed Book | Direct Score | 5 | 5 | 0.620 | 1.025 | 60% | 20% | 20% | 1.000 | 0% | 0% | - | 5 |
| #61 | Mistral Large 2512i | Mistral | T_24H | Closed Book | Direct Score | 5 | 5 | 0.620 | 1.025 | 60% | 20% | 20% | 1.000 | 0% | 0% | - | 5 |
| #61 | Mistral Large 2512i | Mistral | T_2H | Closed Book | Direct Score | 5 | 5 | 0.635 | 1.047 | 60% | 20% | 20% | 1.000 | 0% | 0% | - | 5 |
| #61 | Mistral Large 2512i | Mistral | STAGE_OPENING | Closed Book | Probabilistic Forecast | 5 | 5 | 0.620 | 1.025 | 60% | 20% | 20% | 1.000 | 0% | 1% | - | 5 |
| #61 | Mistral Large 2512i | Mistral | T_24H | Closed Book | Probabilistic Forecast | 5 | 5 | 0.620 | 1.025 | 60% | 20% | 20% | 1.000 | 0% | 0% | - | 5 |
| #61 | Mistral Large 2512i | Mistral | T_2H | Closed Book | Probabilistic Forecast | 5 | 5 | 0.638 | 1.047 | 60% | 20% | 20% | 1.000 | 0% | 0% | - | 5 |
| #61 | Mistral Large 2512i | Mistral | T_24H | Open Book | Probabilistic Forecast | 5 | 5 | 0.624 | 1.016 | 60% | 20% | 20% | 1.000 | 0% | 10% | 100% | 5 |
| #68 | Claude Opus 4.8i | Anthropic | T_2H | Open Book | Direct Score | 5 | 4 | 0.635 | 1.019 | 60% | 0% | 20% | 1.600 | 0% | 0% | 83% | 4 |
| #68 | Claude Opus 4.8i | Anthropic | T_24H | Open Book | Probabilistic Forecast | 5 | 4 | 0.617 | 0.989 | 60% | 0% | 20% | 1.600 | 0% | 0% | 100% | 4 |
| #68 | GPT-5.5i | OpenAI | T_24H | Open Book | Direct Score | 5 | 4 | 0.674 | 1.073 | 60% | 0% | 20% | 1.600 | 0% | 0% | 60% | 4 |
| #68 | Qwen 3.7 Maxi | Qwen | T_2H | Open Book | Probabilistic Forecast | 5 | 4 | 0.560 | 0.915 | 60% | 0% | 20% | 1.800 | 0% | 0% | 100% | 4 |
| #72 | Claude Fable 5i | Anthropic | STAGE_OPENING | Open Book | Direct Score | 5 | 2 | 0.654 | 1.044 | 60% | 0% | 0% | 1.400 | 0% | 1% | 47% | 2 |
| #72 | Claude Fable 5i | Anthropic | T_2H | Open Book | Direct Score | 4 | 2 | 0.496 | 0.849 | 75% | 0% | 0% | 1.750 | 33% | 0% | 17% | 2 |
| #72 | Claude Fable 5i | Anthropic | STAGE_OPENING | Open Book | Probabilistic Forecast | 5 | 2 | 0.672 | 1.073 | 60% | 0% | 0% | 1.400 | 0% | 1% | 46% | 2 |
| #72 | Claude Fable 5i | Anthropic | T_2H | Open Book | Probabilistic Forecast | 4 | 2 | 0.513 | 0.872 | 75% | 0% | 0% | 1.750 | 33% | 0% | 33% | 2 |
| #72 | Claude Opus 4.8i | Anthropic | T_24H | Open Book | Direct Score | 5 | 2 | 0.600 | 0.967 | 60% | 0% | 0% | 1.400 | 0% | 0% | 90% | 2 |
| #72 | Claude Opus 4.8i | Anthropic | STAGE_OPENING | Open Book | Probabilistic Forecast | 5 | 2 | 0.617 | 0.996 | 60% | 0% | 0% | 1.400 | 0% | 0% | 99% | 2 |
| #72 | Claude Opus 4.8i | Anthropic | T_2H | Open Book | Probabilistic Forecast | 5 | 2 | 0.636 | 1.015 | 60% | 0% | 0% | 1.400 | 0% | 0% | 67% | 2 |
| #72 | DeepSeek V4 Proi | DeepSeek | T_24H | Closed Book | Probabilistic Forecast | 5 | 2 | 0.604 | 0.983 | 60% | 0% | 0% | 1.400 | 0% | 0% | - | 2 |
| #72 | Gemini 3.1 Proi | T_2H | Open Book | Direct Score | 5 | 2 | 0.670 | 1.061 | 60% | 0% | 0% | 1.400 | 0% | 0% | 50% | 2 | |
| #72 | Gemini 3.1 Proi | T_2H | Open Book | Probabilistic Forecast | 5 | 2 | 0.661 | 1.051 | 60% | 0% | 0% | 1.400 | 0% | 0% | 50% | 2 | |
| #72 | GPT-5.5i | OpenAI | STAGE_OPENING | Open Book | Direct Score | 5 | 2 | 0.656 | 1.051 | 60% | 0% | 0% | 1.400 | 0% | 0% | 78% | 2 |
| #72 | GPT-5.5i | OpenAI | T_2H | Open Book | Direct Score | 5 | 2 | 0.673 | 1.069 | 60% | 0% | 0% | 1.400 | 0% | 0% | 50% | 2 |
| #72 | GPT-5.5i | OpenAI | STAGE_OPENING | Open Book | Probabilistic Forecast | 5 | 2 | 0.638 | 1.021 | 60% | 0% | 0% | 1.400 | 0% | 0% | 69% | 2 |
| #72 | GPT-5.5i | OpenAI | T_24H | Open Book | Probabilistic Forecast | 5 | 2 | 0.611 | 0.982 | 60% | 0% | 0% | 1.600 | 0% | 0% | 60% | 2 |
| #72 | GPT-5.5i | OpenAI | T_2H | Open Book | Probabilistic Forecast | 5 | 2 | 0.667 | 1.058 | 60% | 0% | 0% | 1.400 | 0% | 0% | 33% | 2 |
| #72 | Qwen 3.7 Maxi | Qwen | STAGE_OPENING | Closed Book | Direct Score | 5 | 2 | 0.609 | 0.993 | 40% | 0% | 0% | 1.600 | 0% | 0% | - | 2 |
| #72 | Qwen 3.7 Maxi | Qwen | T_24H | Closed Book | Direct Score | 5 | 2 | 0.644 | 1.030 | 40% | 0% | 0% | 1.600 | 0% | 0% | - | 2 |
| #72 | Qwen 3.7 Maxi | Qwen | T_2H | Closed Book | Direct Score | 5 | 2 | 0.593 | 0.972 | 60% | 0% | 0% | 1.400 | 0% | 0% | - | 2 |
| #72 | Qwen 3.7 Maxi | Qwen | STAGE_OPENING | Closed Book | Probabilistic Forecast | 5 | 2 | 0.577 | 0.947 | 60% | 0% | 0% | 1.600 | 0% | 0% | - | 2 |
| #72 | Qwen 3.7 Maxi | Qwen | T_24H | Closed Book | Probabilistic Forecast | 5 | 2 | 0.578 | 0.947 | 60% | 0% | 0% | 1.600 | 0% | 0% | - | 2 |
| #72 | Qwen 3.7 Maxi | Qwen | T_2H | Closed Book | Probabilistic Forecast | 5 | 2 | 0.614 | 0.986 | 60% | 0% | 0% | 1.600 | 0% | 0% | - | 2 |
| #72 | Qwen 3.7 Maxi | Qwen | T_24H | Open Book | Direct Score | 5 | 2 | 0.608 | 0.985 | 60% | 0% | 0% | 1.200 | 0% | 0% | 100% | 2 |
| #72 | Qwen 3.7 Maxi | Qwen | STAGE_OPENING | Open Book | Probabilistic Forecast | 5 | 2 | 0.639 | 1.023 | 60% | 0% | 0% | 1.600 | 0% | 0% | 99% | 2 |
| #95 | Grok 4.3i | xAI | STAGE_OPENING | Open Book | Probabilistic Forecast | 5 | 1 | 0.574 | 0.920 | 60% | 0% | 0% | 1.200 | 0% | 0% | 100% | 1 |
| #95 | Qwen 3.7 Maxi | Qwen | T_24H | Open Book | Probabilistic Forecast | 5 | 1 | 0.649 | 1.051 | 60% | 0% | 0% | 1.400 | 0% | 0% | 90% | 1 |