O3-mini not agentic?

Awesome. I´m getting great results via chat with o3 - using chat for the first time in a while, its the composer flow that it problomatic with o3. Based on how strong it is in chat, I suspect it will be really strong once we get composer working.

Just updated Cursor to 0.45.9 and it is still very bad…

o3-mini took like a minute without any output and what it wrote was, how to put it politely, “not good”. It most likely utterly failed because it didn’t do any codebase search.

Code is pretty much nonsense, not sure why it “Stopped”, but the end of the text suggests o3-mini wasn’t going to do anything more anyway.

Meanwhile Sonnet essentially zero shot it with same context and prompt. (Maybe not perfect, eg filename should be put to a constant, but it works.)

Same for me, not applying any code most of the time, and when it does, the overall responses feel super dry, like it was not wanted to work but was forced to :). It silently starts making changes first without saying what it’s trying to do, and in the end just provides a very short report of what was done, like with very low enthusiasm to help. In contrast to Claude, which is very positive and always happy to jump in and help.

I updated cursor rule to tell it to 'dont tell me what you plan to do, just create or update the files.

This has made it work with much higher success rate of about 90% I would say, if not higher.

But as the chat gets longer, it will be more likely to not follow that cursorrule.

starting a net new chat gets it back to functioning really well again.

same here.

Yes, same for me! With Claude everything is so well integrated… you give a prompt and it really analyzes what the task is… it goes step by step… searches and greps the code, looks at every file, enters every file, makes the change, verifies everything it just did! everything just works great!

With OpenAI’s O3 I don’t get this behavior, it just changes things or doesn’t analyze them properly. Hopefully, this improves soon because I don’t think it’s the model itself. Probably with a reasoner model, this kind of step-by-step behavior, where you can prompt it to be more careful and avoid mistakes, should work much better.

2 Likes