O3 beat r1 again on coding

OpenAI launched o3-mini and o3-mini-high today, so I decided to test their coding capabilities by solving LeetCode 3435, the final problem from last week’s weekly contest. The results were surprising.

Two days ago, I attempted the same problem using o1 and DeepSeek r1, but neither could pass all test cases. Today, I ran it again with DeepSeek r1 and o3-mini-high, and the difference was significant:

• o3-mini-high: 2m 43s, passed all test cases

• DeepSeek r1 : 6m 15s , passed 384 / 808 test cases

It’s possible that o3 encountered this problem during pre-training, but the improvement in speed and accuracy is still remarkable. AI coding models are evolving rapidly, and this seems like a major step forward.

9 Likes

That’s amazing! thanks for the share

btw how are you setting o3-mini-high on cursor? I only see o3-mini

2 Likes

1 Like

sadly this hasn’t been my experience so far. i’ve tried both o3-mini and r1 for some hours on a relatively small codebase (around 30k loc). r1 performed better imo. still too early to tell tho.

update after spending more time with both:


yoo, i stand corrected, o3-mini got some moves ngl

2 Likes

where are all the hypers who were promoting r1 for last couple weeks?

feels a little odd honestly… i was never impressed with r1… deepseek-chat has been equivalent and cheaper for fast little api calls in my app…

just odd with the huge promoting of it here… and then days later nvda crashes because everyones afraid of deepseek… and given the deepseek founders background in investing… seems like a nice guy but just an odd couple weeks.

R1’s thinking tokens are invaluable for debugging derailing prompts.

One-shotting is kind of a myth for real coding.

1 Like

@joefaron I think that more than hype, DeepSeek is a combination of it being really good, open source, cheap, openly showing the “reasoning” chain, etc. And in practice and use it is really good to work with. In my opinion it is just the kick-off for a paradigm shift in AI towards open-source in many parts of the chain, which will boost everything at different levels.

Also, I don’t know if we would have had cheap o3-mini, and so much generosity from OpenAi if this hadn’t happened :smile:

2 Likes

Give it a week and we will say the same about o3-mini