If you disagree, show me one example of a prompt that your favorite model can complete from scratch in agent mode that auto mode cannot. Don’t argue about complexity - just give me a concrete example where auto fails and your preferred model succeeds, starting from an empty directory.
Thirdly, I had tasks that Gemini and Grok could barely solve or did not even solve. Auto (most likely in this situation) did not even understand what was wanted from him.
And yes, I’m not going to look for examples of them, because there’s a whole repository, a prompt and an answer that needs to be shared for answering on just give me a concrete example.
Yes agent does not frequently work in empty directories, but that is the only way you can show me an actual verifiable example of a problem that your favorite model can solve directly, that can’t be solved by that same model in auto mode. Because otherwise you just argue (as you are here) that your work is more complex than mine, and your code is private or proprietary, or you don’t want to look for it(but please just trust me anyways because I know better than you).
Yes it is slower, but any half decent developer doesn’t need speed because they are multi-tasking.
I have tasks that Opus can’t solve immediately without a little help, but it doesn’t take much effort to get similar outputs from the other models.
It is impossible for me to prove that I can do everything on auto that you can do not on auto, but you only need to find and show one single example of a task that fails on auto that works when you give it to a specific model.
Incorrect
All you need is the same context and the same prompt to verify something.
And, ideally, you’d run it three times because of the randomization inside LLMs (which, of course, nobody, including me, is actually going to do).
Incorrect (if taken literally)
GPT-4.1, which is most often used under the hood in Auto, is actually one of the fastest models currently available.
Yeah, that’s what I meant. Sorry if I misread your level. And, yeah, I’m too lazy to gather that data for that internet argument: GPT-4.1 is just orders of magnitude dumber than all the frontier models — that’s an axiom.
Sometimes Auto randomly uses one of the Claudes, and then you might get lucky and pull off something moderately complex.
Auto is not as useless as the whole forum talks about it, but its usefulness is severely limited.
Yes, you could do it with the same context, but no one who complains about auto seems to be willing to share any specific situations. But yes ideally you would run it many times.
Auto uses many different models not just gpt4.1. Also just for clarification, when I said it was slower, I meant if it takes 10 minutes to solve a problem on auto, that you could solve in 5 minutes with a specific LLM, that would be slower in my book, even though auto maybe ran 30 prompts in the time it took the other LLM to run 10. But yes, gpt4.1 can be faster and a little dumber, but my point is that it still gets there in the end, and auto doesn’t just use one model. It uses most of the models that are agentic.
I think auto can be just as useful and productive as any other individual model if you use it correctly, but I also know my use case isn’t that common, I have at least 20 instances of cursor open at any given time using Auto, and I don’t have any of the issues people talk about with auto being unable to do things. For me it takes more time but auto always gets there in the end.
I would also argue and maybe you’d agree that auto is good enough to do at least most of the bulk of the work especially in a new project, and then you should use Claude opus and maybe some other max models near the end for the actually complex tasks.
I still give Auto a chance sometimes, but Auto is too often dumb to waste time on it. I use it only for analyzing the application code, because it has access to semantic search in the repository and in general it is still LLM.
If your budget is not so limited, try o4-mini instead of Auto.
I agree it can complete whatever you tell it to. The problem is I wouldn’t give it to the worst enemy to maintain it later.
Luckily you can have auto maintain it. Auto does a good job of fornatting if you tell it to, it just takes a few extra prompts.
20 instances, tell me you know nothing without telling me. Clearly, you’re producing slop. I’ve seen auto change internally to three different models, one equally as useless as the next. I gave it a problem to solve today, I was fighting trying to isolate a gradient explosion issue in my training pipeline. Auto absolutely could not solve or even find the problem at all, just when it started getting some traction it lost its context (model switch). Eventually I gave the same prompt to sonnet 4, it provided the cause, the fix and safety.
I do not use Max or Opus at all. I give the LLM’s microtasks to handle so they don’t go insane and hallucinate.
Maybe I know something you don’t. Maybe my documentation is better, maybe my automation systems for allowing them to check their work is better. maybe I give them smaller steps than you do. All I know is that my software goes to production, and I don’t receive bug reports even though my users love to complain.
Yes loosing context is an issue when the model swaps, so I wonder what you could do so a model isn’t reliant on having tons of context to solve a problem?
I gave auto and claude the same prompt, basically asking them to make asteroids in an html file. After about 5 prompts to each in every case the prompt was identical to both, the only difference was the working directory was called “Auto” or “Claude” I used only Claude 4.0 non thinking non max mode, and I used Auto in the other working directory.
Auto got me functional code in about 15 percent less time.
Claude got me code where the game would not restart, even though the fourth prompt was specifically about bugs, but didn’t mention which bugs the “testers” found, and the fifth prompt was specifically about the restart bug.
Auto made functional code.
Claude’s code had a bug even though it was specifically mentioned.
The initial prompt was generated by gemini 2.5 pro in the browser, and the other prompts were just mine based on the issues they both had.
I’d love to see you do a test of both and let me know what the results are. I just don’t understand why people seem to hate Auto so much, when in my testing, it is better cheaper and faster. I’d also be very curious what test you would do, or if you repeated my testing what results you would get potentially with different prompts but the same results in mind.
Same experience here with Auto! do some documentation creation, it may handle it halfway OK. But anything to do with Code - Like the dumbest developer on the team!
The problem is that by sending a request to Auto, you can’t know which model will be under the hood. Devs say that any model from the list can be there. I’ve never seen Auto thinking. And among non-thinking models, smartness is also a difference.
it doesn’t need to think, the “thinking” is just models prompting themselves before giving you a final output, which is exactly what cursor’s agent mode does, it tries things evaluates the results, and then makes changes as necessary.
I am on Auto for the next 9 days till Ulltra +Usage($100) reset
Its okay, I oftent prompted the higher models when i should have been using a lesser model for lesser tasks … I especiallay wasted a lot of Opus and got rate limited out on him first 2 weeks ago.
Going to exercise a lot more judgement next month
Yea. And the funny part is for a lot of changes, auto will do it faster because you don’t need claude opus everytime to delete something or change the color of a button or other super simple stuff like that. And why do those tgings yourself if auto will do it faster for free.
After studying my Usage Summary which I’ll share here - its clear the thinking is expensive - and how is the result ever really very different ? The Sonnet models are the worker Bee’s they don’t need to think to do what i want them to do
But I am happy to Pay o3-Pro to think, at ten times the cost of o3. Gok4 is very good for research too. But switch to auto to get the code written after o3-Pro lays out the design.
The truth is now, there is a lot more code changes I am doing myself, like shifting blocks of code around, or to different modules, just because a screwup is possible if I don’t do it myself.
Tab is working well with its awareness of the context in chat - especially if you highlight the chat context you want to work with.
I always was hands on decades before IDE’s and just sitting back and letting the AI roll its own, is not the best strategy all the time, though untethered agents in auto can be effective, but very messy …
Yea. You can also try to ask auto to find working examples of the code from the internet, and then have it integrate it. I find sometimes it can find code better than it can write it.
I can literally compare it to itself 6 months ago. Now its like a 5 year-old who needs to be handheld and constantly reminded not to destroy the house. Take your eye off the kid for 1 sec and your whole house is on fire.
I have recently resigned to only asking it to do low junior level stuff and it still needs constant attention.
I have found that it is very bad at paying attention to the details of a request. No matter how simple.
It is absolutely useless.


