O3 beat r1 again on coding

astrax · January 31, 2025, 11:49pm

OpenAI launched o3-mini and o3-mini-high today, so I decided to test their coding capabilities by solving LeetCode 3435, the final problem from last week’s weekly contest. The results were surprising.

Two days ago, I attempted the same problem using o1 and DeepSeek r1, but neither could pass all test cases. Today, I ran it again with DeepSeek r1 and o3-mini-high, and the difference was significant:

• o3-mini-high: 2m 43s, passed all test cases

• DeepSeek r1 : 6m 15s , passed 384 / 808 test cases

It’s possible that o3 encountered this problem during pre-training, but the improvement in speed and accuracy is still remarkable. AI coding models are evolving rapidly, and this seems like a major step forward.

PixelNomad · February 1, 2025, 2:42am

That’s amazing! thanks for the share

btw how are you setting o3-mini-high on cursor? I only see o3-mini

younio · February 1, 2025, 5:51am

utku · February 1, 2025, 6:13am

sadly this hasn’t been my experience so far. i’ve tried both o3-mini and r1 for some hours on a relatively small codebase (around 30k loc). r1 performed better imo. still too early to tell tho.

update after spending more time with both:

yoo, i stand corrected, o3-mini got some moves ngl

joefaron · February 1, 2025, 5:01pm

where are all the hypers who were promoting r1 for last couple weeks?

feels a little odd honestly… i was never impressed with r1… deepseek-chat has been equivalent and cheaper for fast little api calls in my app…

just odd with the huge promoting of it here… and then days later nvda crashes because everyones afraid of deepseek… and given the deepseek founders background in investing… seems like a nice guy but just an odd couple weeks.

leoing · February 1, 2025, 5:34pm

R1’s thinking tokens are invaluable for debugging derailing prompts.

One-shotting is kind of a myth for real coding.

dannetstudio · February 1, 2025, 6:05pm

@joefaron I think that more than hype, DeepSeek is a combination of it being really good, open source, cheap, openly showing the “reasoning” chain, etc. And in practice and use it is really good to work with. In my opinion it is just the kick-off for a paradigm shift in AI towards open-source in many parts of the chain, which will boost everything at different levels.

Also, I don’t know if we would have had cheap o3-mini, and so much generosity from OpenAi if this hadn’t happened

debian3 · February 1, 2025, 7:06pm

Give it a week and we will say the same about o3-mini

Topic		Replies	Views
O3-mini is LIVE! What version are we getting? Discussion	69	12922	February 25, 2025
Sonnet 3.5 + R1 is still the king Discussion	1	358	February 4, 2025
R1 model is amazing Discussion	34	8495	February 12, 2025
Potential concern with Deepseek R1 Discussion	32	6158	February 14, 2025
Compare R1, o3-mini, and Sonnet Discussion	0	49	February 20, 2025

O3 beat r1 again on coding

Related topics