Zelili AI

GPT 5.2 Launches with 100% Math Score, Real Physics, and a Price Hike

GPT 5.2 Launch

OpenAI has officially ended its internal Code Red. Just weeks after Google’s Gemini 3 Pro shook the industry, OpenAI struck back yesterday with the release of GPT 5.2. The update brings a stunning leap in reasoning capabilities. It achieved a perfect score on the AIME 2025 math benchmark. It also introduced physics simulation capabilities that blur the line between code and reality.

I spent the last 24 hours testing the new Thinking model to see if it lives up to the hype. Here is everything you need to know about the new king of AI.

The Thinking Model: A New Standard for Reason

The most shocking metric from this launch is not a speed improvement. It is raw intelligence. GPT 5.2 achieved a 100% score on the AIME 2025 benchmark. For context, this is a grueling math competition where Gemini 3 Pro scored 95% and Claude Opus 4.5 hit 92.8%. GPT 5.2 did not just pass. It solved every single problem correctly.

Even more significant is the result on ARC AGI 2. This test measures true general intelligence and adaptation rather than just memorization.

  • GPT 5.1 Score: 17%
  • GPT 5.2 Score: 52.9%

This massive improvement suggests the model is no longer just predicting the next word. It is genuinely learning and generalizing instantly.

Real Physics and 3D Understanding

We are used to AI writing code, but GPT 5.2 understands how the physical world moves. I watched a demo by Flavio Adamo where the model generated a 3D hexagon with bouncing balls inside it.

The lighting, collision physics, and brightening effect upon impact were flawless. It rendered purely through code generated in a single shot. In another test by Ethan Mollick, the model built an infinite city of Neo Gothic towers in a stormy ocean. The water physics and wave turbulence reacted realistically to user sliders. This proves the model has a deep grasp of visual and physical laws.

Reliability for Business: The Cap Table Test

OpenAI is pitching this model heavily for economically valuable work. My tests confirm why. In a direct comparison, the previous GPT 5.1 model failed a complex capitalization table task. It incorrectly calculated Series A and B liquidation preferences. That is a mistake that could cost a startup millions.

GPT 5.2 handled the same request perfectly. It correctly calculated the payouts and formatted the Excel file for immediate human review. For finance and legal professionals, this reliability upgrade is the killer feature.

Benchmark Breakdown: GPT 5.2 vs The World

Here is how the new model stacks up against its fiercest rivals:

BenchmarkGPT-5.2 ThinkingGemini 3 ProClaude Opus 4.5
AIME 2025 (Math)100%95.0%92.8%
ARC-AGI-2 (Intelligence)52.9%31.1%37.6%
SWE-bench Pro (Coding)55.6%43.3%52.0%
GPQA Diamond (Science)92.4%91.9%87.0%
Tool Use (TA2 Bench)98.7%N/AN/A

The Cost of Intelligence

The only downside to this massive upgrade is the price tag. OpenAI has raised the API costs for GPT 5.2. This signals that Thinking models require significantly more compute.

  • Input Cost: $1.75 per million tokens (up from $1.25).
  • Output Cost: $14.00 per million tokens (up from $10.00).

While the base cost is higher, the efficiency is undeniable. Solving a complex ARC AGI task now costs roughly $11. That is down from an estimated $4,500 just a year ago. You pay more per token, but you need far fewer attempts to get the right answer.

Final Verdict

GPT 5.2 is available immediately for all paid users on Instant, Thinking, and Pro tiers. If your workflow involves complex math, high stakes financial data, or coding physics simulations, upgrading is no longer optional. It is a requirement to stay competitive.