Zelili AI

Grok 4.20 ForecastBench Ranking Leaderboard

xAI’s Grok 4.20 preview is already #2 on ForecastBench, beating almost every major model. Traders and prediction pros are taking notice… full version drops mid-February.

Grok 4.20 ForecastBench Leaderboard

Grok 4.20 ForecastBench Leaderboard: xAI’s latest Grok 4.20 preview has achieved a remarkable second-place ranking on the ForecastBench AI forecasting leaderboard, scoring an impressive 0.102 overall.

This positions it ahead of major competitors like OpenAI’s GPT-5, Google’s Gemini 3 Pro Preview, and Anthropic’s Claude Opus 4.5, while narrowing the gap with elite human superforecasters at 0.085.

The accomplishment highlights Grok’s advancing capabilities in probabilistic reasoning and future event prediction, even as full training faces delays due to environmental challenges.

ForecastBench serves as a rigorous benchmark for evaluating AI models’ forecasting accuracy across diverse datasets and real-world market scenarios.

It measures performance using metrics like Brier scores, where lower values indicate better calibration and sharpness in predictions.

For users interested in AI-driven decision-making, this leaderboard offers valuable insights into which models excel at anticipating outcomes in fields such as economics, geopolitics, and technology trends.

Breaking Down the Leaderboard Rankings

Here’s a summary of the top positions on the ForecastBench tournament leaderboard as of late January 2026:

RankOrganizationModel/TeamOverall ScoreDataset ScoreMarket Score
1FRISuperforecaster median forecast0.0850.1310.040
2xAIGrok 4.20 (Preview)0.1020.1420.062
2Cassiensemble_2_crowdadj0.102N/AN/A
4FRIGPT-5-2025-08-07 (zero shot with crowd forecast)0.104N/AN/A
5Lightning RodForesight-32B0.106N/AN/A
5FRIGemini-3-Pro-Preview (zero shot with crowd forecast)0.106N/AN/A

These scores reflect combined performance on resolved questions, with confidence intervals ensuring statistical reliability. Grok’s strong showing in both dataset and market categories demonstrates its edge in handling uncertainty.

Navigating the Training Delay

The preview’s success comes amid hurdles for the full Grok 4.20 release. Extreme winter storms in Memphis, Tennessee, caused power outages and infrastructure disruptions at xAI’s Colossus supercluster, pushing completion to mid-February 2026.

Read More: Alibaba Unveils Open-Source Qwen3-ASR | Speech Recognition Models for Superior Multilingual Accuracy

Factors included unusually cold temperatures affecting equipment and construction incidents damaging power lines. Despite the setback, recent checkpoints indicate substantial improvements, suggesting the final model could outperform expectations.

For developers and businesses, this delay underscores the vulnerabilities of large-scale AI training to external factors like weather and energy supply. xAI’s rapid scaling, powered by NVIDIA GPUs, aims to mitigate such issues in future iterations.

Grok 4.20’s forecasting prowess opens doors for applications in stock trading, where accurate predictions can inform investment strategies, and prediction markets, enabling better risk assessment.

In finance, it could analyze market trends with high precision, while in other sectors, it aids in supply chain forecasting or event planning.

  • What is ForecastBench?

    ForecastBench is an AI benchmarking platform that evaluates models on forecasting tasks, using datasets and markets to measure prediction accuracy via scores like Brier.

  • Why did Grok 4.20’s training get delayed?

    Extreme cold weather and power outages in Memphis disrupted xAI’s Colossus supercluster, delaying completion to mid-February 2026.

  • How does Grok 4.20 compare to other models?

    It ranks second with a 0.102 score, outperforming GPT-5 (0.104), Gemini 3 Pro Preview (0.106), and Claude Opus 4.5 (0.107), but trails human superforecasters (0.085).

  • What are potential uses for Grok’s forecasting abilities?

    It excels in stock trading for market predictions, prediction markets for risk evaluation, and broader applications like economic forecasting or strategic planning.