Sonnet 3.5 vs o3 mini

bulawow · February 14, 2025, 6:50pm

o3-mini looks promising on the benchmarks but is it good in practice

To those who tested both which is better in youre opinion o3-mini or sonnet 3.5 is still the way to go.

T1000 · February 14, 2025, 7:09pm

lol, reading the forum gives mixed vibes. I would also be interested in specific prompts where o3-mini is preferred and why vs. claude.

Some say o3-mini is sooo good, while others struggle with it completely.
As far as i understand o3-mini isnt yet at optimal integration for composer agent but Cursor team should update us when they made changes.

debian3 · February 14, 2025, 8:52pm

I think it depends on what you ask. My guess it’s a smaller model, so if you code in less popular language, it perform worst. If you are coding a snake game in Python, it marvellous. Hence the all over the place comment.

For me it hallucinate more because it know less than Sonnet 3.5 in Elixir. Sonnet or even 4o is better for me. R1 if you need some thinking. o3 if you need some fast thinking.

T1000 · February 14, 2025, 8:56pm

Have you tried adding Elixir docs to @Docs in settings? That should improve knowledge.

Thanks for the feedback about R1 and o3-mini.

debian3 · February 14, 2025, 9:00pm

It can, but in the end sonnet 3.5 already perform very well. Annoying to call doc on each request.

mehmet-py · February 14, 2025, 9:51pm

03 mini is very fast, it responds well to simple requests, but for complicated things claude is still better at contextualizing and building…

DaleLJefferson · February 15, 2025, 7:37am

Why not both, they are different tools for different jobs.

Claude 3.5 Sonnet: Chat
o3-mini: Thinking

For chat you want low latency, but sometimes you have a problem that is worth spending the extra time getting the model to think about it.

Claude: Gather requirements, context and build up a plan (chat)
o3-mini: Review plan then go write this complicated code or fix this non trivial issue.
Claude: Make these minor adjustments to the code (chat).

mehmet-py · February 15, 2025, 10:48am

wouldn’t it be more complicated to use 2 different gpt’s at the same time in sequence when I start a chat? i don’t run 2 gpt’s in a chat because i think this is the case. have you done something like this before?

DaleLJefferson · February 15, 2025, 11:14am

Just change the model in the drop down takes a second.

If I want to do something really simple I move from Sonnet to Haiku.

For example committing and pushing the code, I don’t need to pay 4 cents to do that, Haiku can handle that for 1 cent.

mehmet-py · February 15, 2025, 11:22am

I understand that brother, my question is this:

when you switch between different gpt’s in the same chat window, isn’t the gpt more likely to get it wrong? because you started the chat window with sonnet, but when you want to solve a simple problem by choosing haiku instead of sonnet, isn’t haiku more likely to get it wrong? or after haiku, isn’t sonnet more likely to get it wrong? because whichever gpt you started with in a chat window has more context history.

lukstafi · February 15, 2025, 11:24am

Sonnet seems more talkative in the agent mode which I actually like.

DaleLJefferson · February 15, 2025, 11:43am

My understanding and from experience is that every request is independent, the context is stored locally and sent to the server with every request.

Effectively when you make a second request it sends up the current context/request and the past chat history.

It’s the same if you do Sonnet → Sonnet or Sonnet → Haiku each request is fully independent.

In other words it’s serverless, so no state is stored on the server except caches.

There would be a performance and cost (for Cursor) switching from Sonnet → o3-mini but I’m have not noticed an issue.

well-this-sucks · February 15, 2025, 11:43am

Sonnet talks a lot but makes far more mistakes and is atrocious at following basic commands. That said o3 is buggy and you have to tell it several times to do one thing. It will say “doing that now” and then nothing happens. however it does thing with more “thought”

coldlogic · February 19, 2025, 9:38am

The output tokens of o3-mini are way bigger than 3.5 Sonnet, so I’ve noticed that I’m less likely to get frustrated by omitting redundant context in my responses.

azriel1rf · February 22, 2025, 12:28am

Based on the Aider leaderboard, the o3-mini wins against the Claude 3.5 Sonnet.

halidaee · February 22, 2025, 2:29am

People on the internet tend to like Sonnet better. On the other hand, I do primarily scientific computing and find that o3 does better not just at “architecture” type tasks but even single-line of code requests.

normalnormie · February 22, 2025, 8:43am

Your findings aligns with benchmark reports, especially concerning math and instruction following, Sonnet has the best “tooling” support so these ‘reasoning’ models require a stronger effort from Cursor team, that’s why R1 still hasn’t an agent mode and why o1 got at least usable in the last days.
Benchmarks:
LiveBench
OpenLM

Topic		Replies	Views
Why is claude 3.5 sonnet superior for composer? Discussions	3	869	February 2, 2025
Deciding which model to use (Claude vs O3-mini) Discussions	18	4781	February 16, 2025
Sonnet still better than o3-mini? Discussions	2	874	February 7, 2025
Claude Sonnet 3.5 Agent is sooo much better than o3 mini high Discussions	13	2955	February 21, 2025
Sonnet 3.5 + R1 is still the king Discussions	1	368	February 4, 2025

Sonnet 3.5 vs o3 mini

Related topics