O3-mini agent mode is insane

OK I have no reason to make this post other than I am dumbfounded by what I’ve just witnessed.

I had a service provider who made a significant, complicated change to their API that I needed to accommodate into an existing Svelte app. Not only were the changes complicated, but I needed to maintain backwards compatibility with the prior API version for specific data. It took me 40 mins just to formulate the prompt explaining everything, including key examples of the previous API and the new format. Used o3-mini w/ composer agent mode.

In one shot it made the changes 100% perfectly, and caught an edge case I didn’t even think of. I actually thought I was looking at the wrong environment at first because I couldn’t find a single thing broken.

Insane.

14 Likes

That’s beautiful man

its a great time to be alive

1 Like

The agent mode with o3-mini in Cursor is working? Just yesterday it was giving worse code than 4o or medium sized local models… O3-mini not agentic? - #23 by Kirai

Watching progress here with interest. o3-mini is an amazing model, and seen it sing on chatgpt.com. I thought maybe the model just didn’t support tools well but having integrated it into my own systems (which has 20+ tools) I now know that the model supports tools brilliantly. I am assuming that the Cursor team will iron out the problems with it and it has the potential to become the best model for coding.

1 Like

This is my o3 experience as well - Although composer still needs some kinks worked out… i find it can do more in a single run than sonnet.

R

Currently, this model, after a few messages, does not want to perform file change actions. It has to be asked several times to do so. Other than that, he does a good job. I am waiting for a fix from the Cursor team.

2 Likes

(post deleted by author)

I am seeing the same. Sometimes it says “I’ll now apply the diff” and then does nothing. I’m also getting errors where the code snippet doesn’t get formed, and instead is printed out in the output itself.

1 Like

Yeah it’s strange, like anything else I’m finding it’s better for certain things than others. For instance - while it nailed this particular refactoring problem, in a completely different scenario, o3-mini bizarrely missed that April 28 2025 is a Monday and thought it was a Tuesday.

As of now, it is not usable; it behaves unpredictably, does not apply code changes, and lacks consistency, and this is very weird, because it works quite well in chat

3 Likes

I’ve found it sometimes has the same bug in ChatGPT when performing Deep Research. If you attempt to do multiple Deep Research within a chat, it writes out what it intends to do, says it is going to do it, and then does nothing.

Same here, hate it when it does that!

Would you mind elaborating on how your tools are set up?

With all due respect, o3-mini is not some “insane” model. These are the results of the Aider leaderboard (models solve hard exercises from https://exercism.org/), and o3-mini is not some “super-model”:

cf.