O3-mini agent mode is insane

thibaud57 · February 6, 2025, 9:41am

not working for me

designnotdrum · February 7, 2025, 4:29pm

experiencing the same. it’s amazing how much better o3 is at getting things right, but it seems… distracted? I constantly have to point it at the right files or tell it to try again.

jchase · February 7, 2025, 5:08pm

Yeah I’m finding that Claude 3.5 Sonnet is still my ultimate go-to & first choice for about 80% of my workload because it’s so fast for iteration and brainstorming as I go, and just works so well in agent mode. But then o3-mini is a great option when I’m dealing with something where I know exactly what the problem is, exactly what I need it to do, and need it to be done very carefully.

eonus01 · February 8, 2025, 1:28pm

I noticed it is a lot more surgical, but seems to do things correctly when you tell him to. Feels like it got better over the past few days, I noticed there were a few updates to Cursor, maybe that has something to do with it? Still have to tell him “proced” here and there, but now I was very surprised it fixed things Sonnet could not.

appcypher · February 8, 2025, 2:25pm

It seems to have gotten better since I tried it a week ago. In fact, I’m switching to it because it solves problems in one shot more than Claude. The back and forth I would have with Claude has reduced.

Another thing I noticed is that o3-mini replies intelligently about the code it is working on. If you question Claude 3.5 Sonnet output, it would assume you want something changed and go off and change things. That has always been a frustrating experience. o3-mini displays a lot more confidence and would know when to just reply your question vs changing things.

Omni-X-Tech · February 8, 2025, 5:58pm

I totally agree! The o3-mini performs extremely well with complex tasks. For me, it manages a big project with many overlapping use cases and features. It’s an AI-based desktop Windows application written in Python that utilizes a lot of thread management with the Windows sub-systems. I’m amazed by how well it handles pattern-based coding, as well as a test-driven approach using test-first! Wow!

Note: The only think Ive noticed is that o3-mini sometimes (rare) misses to output instructions for the changes needed for Cursor to take action. But its most often due to me prompting it wrong. I fix it by telling it “somethings wrong, cursor instructions are missing.”

rongfengzou · February 8, 2025, 7:20pm

Does that mean we can replace Claude sonnet with o3-mini?

jchase · February 9, 2025, 9:53am

I personally wouldn’t treat it as a replacement - I use them both for different contexts and I’m still finding Claude much more useful for creative work, brainstorming new features and where I don’t mind a lot of continuous iteration. o3-mini really shines for maintaining existing code where I don’t want a lot of iteration (just want it to get it right on the first try).

iambaseddev · February 9, 2025, 1:15pm

I’ve personally moved to o3-mini, claude keeps making a lot of mistakes.

akasim · February 9, 2025, 2:09pm

any of you tried some .cursorrules that make the o3 agent consistently apply file changes? I played around a little but couldnt achieve what I wanted but thought someone else could have managed it.

heyitschien · February 9, 2025, 5:14pm

Thanks for sharing your thoughts! I suppose that’s why we have access to different models. Sonnet has proven to be a workhorse and has been quite stable so far.

benjaminfortunato · February 9, 2025, 10:19pm

I’ve had mixed results. o3-mini is definitely a big improvement and can whip up some amazing result. When it comes to working with specific API’s there are often a ton of hallucinations. I need to go through the docs and figure out what the actual namespaces , methods etc are. I wish there was a way to feed it some docs so it could be a bit more accurate. I know there is work being down with RAGS to get domain specific knowledge and more accurate results.

It could also be that I’m working on some very niche use cases that need a bit of digging to resolve . . . sometimes it requires a bit of work with the docs , forums and then tinkering to figure out what works. Can’t expect too much . . .

oscarle · February 10, 2025, 1:43am

So can anyone give me a breakdown, I only use Claude20241022 for coding, but I still curious that if o3-mini or deepseek r1/v3 can give better result?

saketsarin · February 10, 2025, 8:48am

o3-mini is pretty good if you want to use agent mode with a reasoning model as good as deepseek r1, but in my experience claude sonnet is still better at generating code

jchase · February 10, 2025, 12:31pm

Hey just checking but you know you can feed Cursor docs to give it more up-to-date context right? One of the best conveniences of Cursor actually.

Under Cursor Settings (Cmd + Shift + J) > Features > Docs - add new doc and give it a URL to index. (Sometimes you have to check what it actually indexed, depending on the documentation’s URL, it may or may not crawl every part of the documentation)
Then in chat or composer, it’s @docs to reference documentation.

benjaminfortunato · February 10, 2025, 12:48pm

Wow! Will check that out. That is awesome. I’ll try that out and see if it helps.

. . I bit of a separate topic. I might want to start a new thread, but do you know if cursor is also looking at creating a Visual Code alternative. I actually had to move from cursor to visual studio since I’m working on a c# .Net plugin for our web service. I’ve been using github co-pilot in Visual Studio since cursor can’t put together a compiled project.

jchase · February 14, 2025, 4:36pm

I don’t know if that is planned, but I’ll tell you something I’ve found that’s worked for me which may or may not be helpful in your situation: I separately am developing a Swift app in XCode, but I really wanted to have Cursor help with it. I found that I can have Cursor manage the codebase entirely, while having XCode open simultaneously. So Cursor does all the editing, and XCode does all the compiling. Works really well. Again don’t know if that would apply to your situation but I haven’t had any problems with that workflow. Any linting errors that XCode reports, I just pass right back into Cursor.

benjaminfortunato · February 14, 2025, 5:22pm

I guess that would work. I’ll try it out. Github copilot sometimes messes up where it places the code. It has some special commands to reference your codebase but I found a lot of the time I get better responses from chatGpt and o3-mini-high or just plain old 4o.

We’re developing a plugin so Visual Studio offers some templates with boilerplate that is put but I could generate those and then Cursor would just see that new file. You are just opening up the folder where you are developing your code in Xcode using cursor? In Visual Studio I need to add files to the project, embed resources etc, but I guess I don’t need to worry about all that with cursor since it just opens up all the .cs .js etc files that are in the folder.

What about code references to dll libraries? I think that is more a .NET issue. But I start with using Namepace.x. It would be like using an import statement in javascript. I don’t think cursor is set up to handle that. The IntelliSense won’t work properly. Do you have a similar issue with external libraries when writting XCode?

jchase · February 14, 2025, 6:25pm

Yup exactly. Whatever the equivalent of the “src” folder is, I just have them open in both simultaneously. I haven’t run into issues with build files but I’ll often use a .cursorignore file if there’s anything I need Cursor to ignore (e.g. large binaries). Can’t speak to references to dll libraries directly but if they’re common libraries and the model is at least aware of them, can’t imagine it being much of an issue. I know in Swift there’s common imports and such and it adds them as needed like anything else.

benjaminfortunato · February 14, 2025, 6:52pm

I’ll give it a try. I think the dll are a bit different since you can’t install using a package manager like npm or nuget. Its part of that program we are writting the plugin in for. I have a feeling the issue is going to be that we won’t get the intelisense working in Cursor. I’ll start anouther post and see if anyone has any suggestions.

Topic		Replies	Views
O3-mini is LIVE! What version are we getting? Discussions	69	13442	February 25, 2025
O3-mini not agentic? Discussions	32	3438	February 22, 2025
O3-mini in agent mode fails function calls (credits dump) Feedback	8	553	February 16, 2025
🚀 O3 Update incoming Discussions	27	4314	April 20, 2025
Deciding which model to use (Claude vs O3-mini) Discussions	18	4973	February 16, 2025

O3-mini agent mode is insane

Related topics