O3-mini not agentic?

I’m able to open o3-mini in the Composer, but I encounter a problem. It provides good reasoning, but when it says it’s going to apply the first changes to my component, it just stops. Does anyone else have this issue? It seems like it’s not actually changing the files.

6 Likes

Yep, same issue. End up wasting several calls just asking it to apply the code but it never does.

Hey, I haven’t noticed that yet. Does it happen often? Does it occur at the start of the conversation or when it expands?

o3-mini in Cursor is awful… x.com
tried 3 different tasks, always similar results. I think fastest I managed to get some code from it were 3 follow-ups asking it for code. This irritates me in a major way, back to Sonnet for now I guess.

3 Likes

same issue, it doesn’t generate code and if it does then doesn’t apply automatically.

same, it’s just talking

well, you can guide it to write code blocks properly - which can be applied by ide (apply model). reminding it it should “format it as triple backticks followed by language, then : and then file name” seemed to work, though rather annoying. maybe this workaround could be put into cursor rules? not sure…

+1. it happens often; it happens more often than it makes code changes for me. it can happen at any time in the conversation, beginning or after a couple requests. sometimes you only need to say “proceed” or “make the change” once, other times you have to say it 999999999999999 times. The o3-mini agent will always end with saying it’s going to do something, and then not do anything at all.

same here… what I’m doing is asking to add the changes to the composer window, and then asking another model to reapply the changes. 2 credits over 102 if I try to force it to apply.

1 Like

Can confirm: o3-mini often just says “I’ll do that”, then stops.

1 Like

I am not sure why it performs so badly, when in benchmarks (even the low version) all look like solid models. It definitely sees the tools it can use and tool use should be its strong suite.

I just don’t understand how it can fail at such trivial tasks like replacing two words in markdown. It used edit file tool two times (this shouldn’t happen, it should have worked first time) and apply model suggested to delete whole unrelated section…

On more usage, it’s alllmost a feature not a bug: It forces you to use o3-mini as an architect then switch to claude for action. I think this is probably the usage we will converge on anyway.

2 Likes

Thanks for everyone’s reports!

We’re looking into this, but my guess is that o3-mini does not behave like o1-mini in how it should interact inside Cursor, but the team are looking into this bad behavior.

11 Likes

Same problem here.

This problem happens with Deepseek and O3-Mini

2 Likes

This is right at the start.

Just a tidbit of observation from o3-mini in a different AI editor:
in Cursor it “stopped” for me after it finishes reasoning and in the other product then apparently a tool call came in with the actual code block.
That latter part occasionally gets broken in Cursor. HTH :slight_smile:

1 Like

It really is depressing he always deletes all the code the way and the sonnet

Now with 0.45.8 I can’t get it to show me the diff of modifications, or apply changes, even after insisting several times with “yes, proceed”, “yes”, “apply it please”, “PROCEED!”

It keeps confusing the files, but now it even says that it has already applied the changes when it hasn’t made any modifications, an intermediate way of use was to at least see the diff in composer to apply it manually but apparently now as it considers that it has already been applied it doesn’t show it.

The reasoning and proposals it gives are really interesting but it still doesn’t work very well in practice.

I understand that it’s still very early :+1:

Horrible experience with o3, not only does not code when it has the "Supposed solution"but 90% of the time it has damaged my code. Got back to Sonnet. o3 is NOT agentic.

1 Like

Hey all, we are finding o3 to be a decent downgrade in terms of output reliability and halucinations, but are working to improve our integration with it to try to mitigate as much of this as possible.

We have already shipped some improvements to our backend, but more will come over the coming days and weeks to bring o3 closer to o1 and DeepSeek R1-level behavior.

4 Likes