O3-mini not agentic?

Robert137498 · February 2, 2025, 8:23pm

Awesome. I´m getting great results via chat with o3 - using chat for the first time in a while, its the composer flow that it problomatic with o3. Based on how strong it is in chat, I suspect it will be really strong once we get composer working.

Kirai · February 3, 2025, 8:18am

Just updated Cursor to 0.45.9 and it is still very bad…

o3-mini took like a minute without any output and what it wrote was, how to put it politely, “not good”. It most likely utterly failed because it didn’t do any codebase search.

Code is pretty much nonsense, not sure why it “Stopped”, but the end of the text suggests o3-mini wasn’t going to do anything more anyway.

Meanwhile Sonnet essentially zero shot it with same context and prompt. (Maybe not perfect, eg filename should be put to a constant, but it works.)

TargiX · February 3, 2025, 2:24pm

Same for me, not applying any code most of the time, and when it does, the overall responses feel super dry, like it was not wanted to work but was forced to :). It silently starts making changes first without saying what it’s trying to do, and in the end just provides a very short report of what was done, like with very low enthusiasm to help. In contrast to Claude, which is very positive and always happy to jump in and help.

bmadcode · February 3, 2025, 6:16pm

I updated cursor rule to tell it to 'dont tell me what you plan to do, just create or update the files.

This has made it work with much higher success rate of about 90% I would say, if not higher.

But as the chat gets longer, it will be more likely to not follow that cursorrule.

starting a net new chat gets it back to functioning really well again.

r1di · February 3, 2025, 9:36pm

same here.

irux · February 4, 2025, 8:49am

Yes, same for me! With Claude everything is so well integrated… you give a prompt and it really analyzes what the task is… it goes step by step… searches and greps the code, looks at every file, enters every file, makes the change, verifies everything it just did! everything just works great!

With OpenAI’s O3 I don’t get this behavior, it just changes things or doesn’t analyze them properly. Hopefully, this improves soon because I don’t think it’s the model itself. Probably with a reasoner model, this kind of step-by-step behavior, where you can prompt it to be more careful and avoid mistakes, should work much better.

Topic		Replies	Views
O3-mini agent mode is insane Discussion	15	662	February 4, 2025
🚀 O3 Update incoming Discussion	18	2587	February 3, 2025
Agent mode has stopped applying changes? Bug Report	8	653	December 18, 2024
Agent doesn't search in file Discussion	10	120	February 4, 2025
"420-bug": The agentic o3-mini often gets stuck just before actions it announces Bug Report	3	23	February 4, 2025

O3-mini not agentic?

Related topics