Awesome. I´m getting great results via chat with o3 - using chat for the first time in a while, its the composer flow that it problomatic with o3. Based on how strong it is in chat, I suspect it will be really strong once we get composer working.
Just updated Cursor to 0.45.9 and it is still very bad…
o3-mini took like a minute without any output and what it wrote was, how to put it politely, “not good”. It most likely utterly failed because it didn’t do any codebase search.
Code is pretty much nonsense, not sure why it “Stopped”, but the end of the text suggests o3-mini wasn’t going to do anything more anyway.
Meanwhile Sonnet essentially zero shot it with same context and prompt. (Maybe not perfect, eg filename should be put to a constant, but it works.)
Same for me, not applying any code most of the time, and when it does, the overall responses feel super dry, like it was not wanted to work but was forced to :). It silently starts making changes first without saying what it’s trying to do, and in the end just provides a very short report of what was done, like with very low enthusiasm to help. In contrast to Claude, which is very positive and always happy to jump in and help.
I updated cursor rule to tell it to 'dont tell me what you plan to do, just create or update the files.
This has made it work with much higher success rate of about 90% I would say, if not higher.
But as the chat gets longer, it will be more likely to not follow that cursorrule.
starting a net new chat gets it back to functioning really well again.
same here.
Yes, same for me! With Claude everything is so well integrated… you give a prompt and it really analyzes what the task is… it goes step by step… searches and greps the code, looks at every file, enters every file, makes the change, verifies everything it just did! everything just works great!
With OpenAI’s O3 I don’t get this behavior, it just changes things or doesn’t analyze them properly. Hopefully, this improves soon because I don’t think it’s the model itself. Probably with a reasoner model, this kind of step-by-step behavior, where you can prompt it to be more careful and avoid mistakes, should work much better.
Same here, o3-mini can’t use tools very well. It says I’ll do bla bla and then nothing
That worked for me. "I must remind you to format it as triple backticks followed by language, then : and then file name”
It then properly provided the code
Overall o3-mini feels like bored senior dev, which everyday work solely consists of somebody asking him to provide an advice on the codebase. Resists to fix the code himself.
There is always a 2-step process of “getting the plan” and then “begging to make code changes”. Probably a good idea to make the instruction to fix the code right there in rules.
Sonnet, while lacking some architectural skills, is eagerly wanting to fix codebase, treating it as a part of each request. I even added lines to .cursorrules some months ago like “when you see an architectural change or complex task, first provide a plan and do not write any code”.
This leads to an idea (it’s fun that it seems pretty obvious in the current moment) of having model-specific rules separately. Different role setting, different instructions.
About non-agentic behavior: o3-mini did not try to grep the codebase from what I see in the composer UI. Even once. As a result, it suggests changes that are not coherent with the existing codebase or already implemented. I have to include specific files into the context myself and ask it to reevaluate my request. It’s a bored senior dev )))
Hilarious, I’ve been picturing the gen z intern that is resisting work and totally ignores in house style, but could also see senior dev now. Either way, it needs to step up
exactly…I feel like O3 mini has a much better understanding of the architecture and able make changes considering the dependency impacts. But just doesn’t want to do it. A lazy senior dev who wants a junior dev (Sonnet?) to do it for them…
well, with light role-play in cursor settings, o3-mini sounds eager to help, not that much different from sonnet.
too bad it struggles soo much in agentic mode - forgets it can grep, gives changes without using edit_file tool, (fairly often) nor backticks with file path nor (sometimes) backticks at all…
today Cursor tried to create a src dir in root of my system partition, because o3-mini messed up paths…