OpenAI’s GPT-5.2 Pro Shatters Records on FrontierMath Tier 4: A Major Leap in AI Mathematical Reasoning

By Zelili AI
January 24, 2026
Update

OpenAI has achieved a groundbreaking milestone with its GPT-5.2 Pro model, setting a new high score of 31% on the extremely challenging FrontierMath Tier 4 benchmark.

This result, reported by Epoch AI, marks a substantial improvement over the previous best score of 19%, demonstrating rapid progress in advanced mathematical reasoning capabilities among frontier AI systems.

FrontierMath, developed by Epoch AI, stands out as one of the toughest independent evaluations for large language models.

Topics

New record on FrontierMath Tier 4! GPT-5.2 Pro scored 31%, a substantial jump over the previous high score of 19%. Read on for details, including comments from mathematicians. pic.twitter.com/i5nj1kTcMt
— Epoch AI (@EpochAIResearch) January 23, 2026

It consists of hundreds of original, unpublished problems created and vetted by expert mathematicians, designed to test genuine research-level thinking rather than memorized patterns.

Understanding FrontierMath Tiers

The benchmark is structured into four difficulty tiers:

Tier 1–3: Cover undergraduate to early graduate-level mathematics, where top models now approach near-perfect performance.
Tier 4: Contains 50 exceptionally hard problems at the frontier of mathematical research. These often require hours or days of focused effort even from professional mathematicians and postdocs.

Tier 4 problems are deliberately crafted to push beyond current AI limits, making the 31% score by GPT-5.2 Pro particularly significant.

Just six months earlier, leading models hovered around 4% on this same tier, highlighting the explosive pace of improvement in late 2025 and early 2026.

Key Performance Comparison

Here is how GPT-5.2 Pro stacks up against recent top contenders on FrontierMath Tier 4 (accuracy percentages):

Model	Accuracy	Notes
GPT-5.2 Pro	31%	New record holder (Epoch AI evaluation)
Previous best (various)	19%	Substantial prior high mark
Gemini 3 Pro	~15–18%	Strong but now clearly surpassed
GPT-5.2 (xhigh setting)	~12–15%	Lower configuration of the same family

The chart accompanying the announcement shows GPT-5.2 Pro with a clear lead, including error bars indicating statistical confidence.

This performance gap suggests meaningful architectural or training advances, not just marginal gains.

Why This Matters for AI Development and Users

Reaching 31% on Tier 4 is more than a leaderboard win, it signals stronger general reasoning, abstraction, and problem-solving abilities that transfer to real-world scientific and engineering challenges.

OpenAI has emphasized that these improvements reflect deeper understanding rather than narrow benchmark optimization.

For everyday users and professionals, this breakthrough promises:

More reliable help solving complex math, physics, and engineering problems
Better performance in STEM education tools and research assistance
Enhanced capabilities in fields like cryptography, theoretical physics, and advanced data analysis

As models continue scaling and refining reasoning techniques, benchmarks like FrontierMath help track genuine progress toward AI systems that can contribute meaningfully to mathematical discovery.

With GPT-5.2 Pro now leading the pack on one of the hardest known evaluations, the frontier of AI mathematical intelligence is expanding faster than ever.

Researchers, educators, and developers should watch closely, the next wave of breakthroughs in science and technology may arrive sooner than anticipated.