O3-mini in agent mode fails function calls (credits dump)

trubear1010 · February 10, 2025, 10:06am

When using o3-mini in composer (agent mode), it often will say “I will make these changes” and stop generating after that sentence.

From there, it may take multiple requests to get it to make the changes it proposed. I’ve found prompts like “Implement the changes” work 50% of the time. Other times, it again returns a text response with no action taken.

This issue burns a TON of usage and dilutes the model’s context with unnecessary requests, making it hard to work with and less effective than it can be.

P.S. There’s also a formatting issue, where o3-mini will write code formatted as plain text.

mtin79 · February 10, 2025, 3:53pm

same here. i start to doubt its o3-mini (high), maybe rather medium and the hidden cursor instructions seem not adjusted.

maxmini1 · February 10, 2025, 8:22pm

burns usage? isnt o3-mini free to use?

saketsarin · February 10, 2025, 9:07pm

i think the system commands for o3-mini are a bit different, and cursor has not adjusted to them properly yet

i’m facing the same issue with gemini pro too but other models seem to work fine. maybe they’ll fix this in a future update

mxcoppell · February 11, 2025, 1:29am

Not only o3-mini, the integration of other reasoning models is not great either (meaning pretty much useless in development). Don’t think it’s models’ issues. This needs to be addressed by Cursor or any other AI coding IDEs.

condor · February 11, 2025, 1:34am

Yes every model/provider has a bit different requirements and handling.

While the APIs are nowadays almost standard, the models are not.

As we know there are not just differences in context length but also how certain prompts (including Cursor internal system prompts) cause different behavior in different models. That is why Cursor team often reports they are testing new experimental models and have to adjust their internal pre-processing to gain better results.

Luckily some of those changes are done on Cursor server side which dispatches requests to the LLMs and does pre-processing like indexed docu selection or other parts that use Cursor created LLM models which reduce load on the heavier models by preparing the input.

You can notice similar behavior on other AI tools like Perplexity or Claude Chat / ChatGPT from OpenAI when they search the web before processing results by LLM.

Unfortunately there is one part that we do not see but which can cause issues, when AI providers adjust their model behavior and pre-prompts or processing like monitoring or feedback loops, parameter optimization, dynamic routing to different server configurations, distillation/quantization/precision adjustments for faster inference, adjusting concurrency limits per server,…
On that part Cursor has no influence.

I have had great experience with several of the reasoning models, through Cursor and also coding unrelated tools. Current reasoning models are not always well reasoning. DL-R1 often takes data passed from RAG (search/index,…) as more correct than the user prompt, even you tell it what the facts are. The ‘thinking’ process, if you ever read it, can be a mess.

That means that its usage as coding tool is also impacted by such issues. It just handles prompts differently than a non-reasoning conversational LLM.

Hope also that Cursor team can tweak the process and prompts for better output.

garfieldmax · February 12, 2025, 6:54am

Ran into same issues. It says what it will do and doesnt generate code, despite multiple requests to do so.

condor · February 12, 2025, 7:17am

It has been mentioned by Cursor team that they are working on better integration of o3-mini for such cases.

eonus01 · February 16, 2025, 8:27am

I noticed that as of yesterday, O3-mini in cursor (composer agent) has gotten a lot better in tool calling and usually can do a chain of fixes. I feel the number of times I have to say “proceed” and “use diff edit” has now been reduced by 90%. This makes the model a lot more usable now, in fact sometimes I prefer it over Sonnet when I notice that Claude is going in loops.

Topic		Replies	Views
O3-mini not agentic? Discussions	32	3479	February 22, 2025
O3-mini agent mode is insane Discussions	40	10979	February 25, 2025
🚀 O3 Update incoming Discussions	27	4328	April 20, 2025
O3-mini is LIVE! What version are we getting? Discussions	69	13499	February 25, 2025
Add support for o3-mini-high Feature Requests	18	4405	February 24, 2025

O3-mini in agent mode fails function calls (credits dump)

Related topics