On March 25, 2025, Google officially introduced Gemini 2.5, their latest and most intelligent artificial intelligence (AI) model ever. With the first version being Gemini 2.5 Pro Experimental, this model not only marks a huge step forward in the field of AI but also sets a new standard for reasoning, programming, and handling complex problems. Finally, it seems that Claude Sonnet 3.7 - a formidable opponent in the AI race - has a chance to “rest” before the superiority of this newcomer.
The Power of Gemini 2.5
Gemini 2.5 is not just a normal upgrade; it is a real “thinking” model. Unlike previous models that only responded based on trained data, Gemini 2.5 is capable of reasoning step by step before giving an answer. This means that it simulates the way humans solve problems: analyzing, considering solutions, and choosing the most optimal one. As a result, the accuracy of responses is significantly improved, especially in tasks that require complex thinking.
One of the highlights of Gemini 2.5 is its outstanding performance on math and science benchmarks such as GPQA and AIME 2025. It even scored 18.8% on the “Humanity’s Last Exam” test - a dataset designed to challenge the limits of human knowledge and reasoning ability. This is the highest result among models that do not use external support tools, proving the incredible internal power of Gemini 2.5.
In addition, Gemini 2.5 Pro is also a “master” in the field of programming. It outperforms previous versions like Gemini 2.0 and even beats Sonnet 3.7 on some important programming tests, such as the Aider Polyglot, with a 74% success rate compared to Sonnet 3.7’s 64.9%. From creating beautiful web applications, to developing “aggressive” programs, to editing and transforming source code, Gemini 2.5 shows incredible flexibility and accuracy. Want to quickly code a simple game? Gemini 2.5 can help you do that with just a short command line.
Gemini 2.5’s multimodal capabilities are also a highlight. With a context window of up to 1 million tokens (soon to be 2 million), the model can handle text, audio, images, video, and even a huge source code repository simultaneously. This opens up the potential for practical applications such as big data analytics, rich content creation, or even building smarter AI assistants than ever before.
Sonnet 3.7: A Respected Rival Gets a Break
For a long time, Anthropic’s Claude Sonnet 3.7 has been one of the leading AI models, especially with its scalable inference capabilities and strong performance in programming and natural language processing. However, with the arrival of Gemini 2.5, it seems that Sonnet 3.7 can finally “breathe a sigh of relief” and give way to a new rival from Google.
- Reasoning & Knowledge (Humanity’s Last Exam, No tools)
- Gemini 2.5 Pro EXP (03-25): 18.8%
- OpenAI o3-mini High: 14.4%
- OpenAI GPT-4.5 16k Extended Thinking: 6.4%
- Claude 3.7 Sonnet Extended Thinking: 8.9%
DeepSeek R1: 8.6%
- Science (GPQA diamond)
- Gemini 2.5 Pro EXP (03-25): 79.7%
- OpenAI o3-mini High: 84.8%
- OpenAI GPT-4.5 64k Extended Thinking: 84.6%
- Claude 3.7 Sonnet Extended Thinking: 80.2%
- Grok 3 Beta: 71.5%
- DeepSeek R1: 78.2%
- Mathematics (AIME 2025)
- Gemini 2.5 Pro EXP (03-25): 86.7%
- OpenAI o3-mini High: 93.3%
- Claude 3.7 Sonnet Extended Thinking: 77.3%
- Grok 3 Beta: 49.5%
- DeepSeek R1: 70%
While Sonnet 3.7 still holds the lead on some tests like SWE-Bench (70.3% vs. Gemini 2.5’s 63.8%), Gemini’s dominance in other areas – notably multi-language programming and inference without test-time optimization – suggests it’s poised to reshape the game. This isn’t the end of the road for Sonnet 3.7, but rather a milestone that shows how fierce competition in the AI industry is pushing all the big players forward.
Conclusion
Gemini 2.5 is not just a step forward for Google, but also a testament to the rapid pace of AI development. With its superior inference capabilities, top-notch programming performance, and powerful multimedia capabilities, it promises to usher in a new era of practical AI applications. Meanwhile, Sonnet 3.7 may be taking a break, but Anthropic and its competitors are certainly not standing still. The AI race is hotter than ever, and we – the users – are the ones who benefit from this competition.