GPT-4.5 coding benchmarks (worse on SWE-Bench than Haiku!)

AbleArcher · February 28, 2025, 5:48am

For anyone disappointed GPT 4.5 won’t be included with their subscriptions, we’re likely not missing much.

Benchmarks don’t tell a complete story, but are useful nonetheless.

Haiku 3.5 isn’t included above, but scores 40.6% vs. GPT-4.5’s 38%.

GPT-4.5 scores below DeepSeek V3 on Aider Polyglot despite costing ~540x more!

It’s clear now why OAI didn’t include o3-mini (high) on their recently released SWE-Lancer benchmark. They tested the old version of Sonnet 3.5 and it scored 36.1% ($208K.)

OAI has also stated they may discontinue serving it in the API.
(it likely has ~5-10 trillion parameters vs. ~250B [175-400B range] for Sonnet 3.5)

If anyone burned the cash trying it with their own API key, it would be great to get a ‘vibe-check.’

Topic		Replies	Views
Coding benchmarks for o3, and o4-mini Discussions	6	2489	April 18, 2025
New Benchmark shows the new gpt-4-0125-preview is lazier than the previous version Discussions	2	1437	September 4, 2024
Sonnet 3.5 vs o3 mini Discussions	16	3315	February 22, 2025
GPT 5.2 - Out Now! Release Discussions	23	4738	December 23, 2025
Is there a practical difference between -low -std -high GPT-5? Discussions	8	1100	September 1, 2025

GPT-4.5 coding benchmarks (worse on SWE-Bench than Haiku!)

Related topics