I recently bought a Pro subscription, but I ran out of my quota after just a 3 HOURS. This happened because I was working in a large codebase, and the AI used a lot of credits while gathering extensive context.
It would be really useful to have an “Eco Mode” a feature where a smaller, cheaper model first collects and summarizes the necessary context, and only then passes it to the main, frontier model for the final response.
This could significantly reduce credit usage, especially for developers working with large projects. It would also make the workflow more efficient and sustainable.
At least it’s what I think how it could be implemented.
That’s because you didn’t understood it.
The request is actually pretty interesting, the ideia just need to be refined.
Sometimes the model does a lot of requests just to gather context, so a smart cursor algorithm / model could detect the need to gather context, then forward this to a cheap model, then the cheap model do the tool calls, get all context, then pass it to a better model.
This makes no sense: the cost of the analyzer model will be more expensive than the cost of a cheap model. Imagine that you’re going to call gpt-5 for every tenth Grok Code Fast call, just to compress the context.
And you can’t use a stupid and cheap model for analysis, as it will only make things worse.
You can either manually call /Summarize at the correct time, or set a stricter limit on the context window, after which automatic summarization already occurs.
Cursor already has an “Eco” mode, which is the standard mode of Agent. Because not Eco is MAX.
Evaluator shouldn’t be dumber than the model it judging — ideally, it should be smarter. Imagine you’re working in a workshop, and suddenly a kid comes in and puts away the tools he think you don’t need anymore.
You have a good point, maybe it doesn’t make as much sense at current levels of intelligence and costs of some models. GPT-5 to me is pretty cheap and smart.
Sonnet is expensive as you can tell. I am on the same plan, and I try to avoid using Sonnet unless everything else (Auto, gpt-5-mini, grok-code, gpt-5,) is not cutting it or I know it’s particularly complex request.
Your Eco idea is interesting, but I think using cheaper models is the closest thing we have to an Eco mode right now. Have you tried using a cheaper model to start the chat, then once a plan is made, use a more expensive model? But like what @Artemonim said, it may be counter productive to have a cheaper model do the context gathering.
I think just avoiding Sonnet is the solution for most of us lighter users.