Giant Auto Mode Improvement

slyyyle · June 21, 2025, 3:00am

To put some positivity here - I can’t say for certain if its where I’m at in this particular applet, or if my interaction with LLMs or prompting or some high level practice on my part fuels this, but I find Auto to be much improved. I had really advised against it earlier in Cursor’s lifetime, but it really is starting to pick models better than I am. Anyone else notice an improvement specifically here? I used to shy away with larger edits, but it seems to really pick well now and execute the ACTUAL agentic edit correctly and precisely.

I’ve found

o4 mini - if you struggle with other models, give this one a try, force it to web search, it is strong for specific things
google 2.5 - great for Ask on refactors, but give its output of that Ask to Agent with another model like o4 mini - I find Claude is best for high level architecture conversations via Ask, but many models fail precise edits, and interaction with the linter can be brittle
claude 4 opus - literally godlike, i cannot afford this, but it is strong. cost insensitive users, I envy you and sometimes pretend to be you
claude 4 sonnet - conceptual Ask convos. struggles with CSS as do many of the more verbose models

The problem I find is you trade one problem for another. I’ve gotten to the point where I can say it is very strong for most prototyping, and you can slog through dev which is difficult and as the node packages grow, so too does overlap and semantic and system adherence burden. Still, I remade Google Maps, interweaved 4 APIs, refactored, and we’ve stumbled at times, (it’s hard to prompt every little thing out of a move, so I just revert often and try again - sometimes the 2nd try is iteratively much closer). Still, all in all, thought I’d leave my experiences here. I’ve made some pretty complex apps to really run this thing through the hard stuff, and I feel Auto mode is getting close to feeling VERY fluid in my recent experience.

xobosox · June 21, 2025, 12:13pm

I haven’t tried Auto for a very long time. I’d like to know what the decision making process is behind ‘auto’. Does it actually use the context of your request to decide which one is most capable? If so, then perhaps it plays to the strengths of each LLM… and in that case, it’s definitely worth a try again.

If it just chooses the cheapest one or least busy one, then that’s kind of pointless.

Also, with regards to the models. If I need to do some type of SDK or API integration, I’ll generally just choose the model with the latest dataset. Pointing it at a URL with documentation is often useless, as it doesn’t crawl the site to get all of the info it needs to complete the integration.

alexx-ftw · June 21, 2025, 1:59pm

I suggest you use Tavily MCP for latest SDK and API documentation extraction from URLs

xobosox · June 21, 2025, 3:38pm

Thanks, I’ll give it a try. I’ve recently started using Context7, but I’ll compare it with Tavily also.

G4Q4 · June 21, 2025, 4:09pm

You’re right. I’ve been stuck on a bug. Swapped it to auto and it immediately fixed it. I just wish it told us what model it was using.

It would also be cool if the agent could switch between models in a single request. eg, make the plan using O3 and then execute using sonnet 4.

slyyyle · July 9, 2025, 8:20pm

Back here to say there is a massive blind spot in Auto.

When you switch discretely from model to model, there is massive architectural disregard and huge deviation from goal orientation. There seems to be poor model handling or context switching, or perhaps they are not re-embedding or retokenizing context upon model switch. It’s so bad I am once again saying DO NOT USE!

EDIT: I have to wonder at this point when it’s fair to say this is no longer AI fluctuation in output. I think this is pretty much (given the ups and downs) completely dependent upon the stewardship (handling our inputs to the models - I still think it fair to say if that’s your product you have to be responsible for quality especially in usage based products where the switch after a rate limit creates this issue in the first place and actively suggests you jump right into a model switch - to AUTO)

Topic		Replies	Views
Auto mode can handle everything any single LLM can do Discussions	18	624	July 20, 2025
Why does "AUTO" only use ChatGPT? Seeking clarification Discussions	22	1302	July 16, 2025
FR - tell us what model 'auto' is using Feature Requests	6	259	July 25, 2025
Auto mode is faster but makes bad decisions / choices Discussions	3	156	July 16, 2025
Cursor 4.7 "Auto" model selection Discussions	23	6327	August 27, 2025

Giant Auto Mode Improvement

Related topics