I recently upgraded to the $200 plan, expecting it to accommodate a heavy vibe-coding workflow. However, in just 2 days, I’ve already burned through 10% of my total monthly usage. If this continues, I will exhaust a top-tier enterprise limit in less than three weeks.
For context, I’m a Data AND Solutions Engineer architecting autonomous AI agent systems (heavy use of Django). My workflow is highly structured: I use Opus specifically for planning and architecture, and then hand off to auto for execution.
I am already employing strict token-mitigation strategies. I have a heavily optimized .cursorignore file, and I constantly summarize and start fresh chats to prevent context bloat. Despite doing everything “right,” here is what my dashboard looks like after just 48 hours:
With the old Pro Plus, it felt like I could reliably push 1 billion tokens. Now, reverse-engineering the percentages on my dashboard, it looks like the $200 limit caps out at roughly 3.5 Billion total tokens (with Opus throttled heavily). Given my workflow, I was anticipating something closer to 20 billion tokens for that price tier.
Is this massive spike in token consumption (especially with Composer 2’s cache reading and Opus 4.6’s high-thinking defaults) the intended behavior?
I want to open a discussion on whether the current credit/token math actually scales for developers building complex, multi-file agent architectures. I love Cursor, but if a $200 plan can’t sustain standard, optimized execution loops for a full month, I will have no choice but to explore migrating to other tools.
Would appreciate any insights from the team or others experiencing this cliff.
My Pro AUTO quota has just been reset, and I asked one simple task to Cursor: “delete my /node_mudules folder and then run pnpm install to restore it.” and BOOM, it took 1% of my quota. I remeber six months ago when I started with Cursor and it took forever to reach te limit. Now I have the impression that at each month that pass, my quota drains more rapidly. I was wandering that if I cancel my subscription and start with new e-mail and credit card, the Cursor algorithm will treat me as a new user to win and be less aggressive with quota spent, at least for more 6 months…
The System will remove this topic as soon as I post it, because they don’t want me to ask that kind of question in front of everyone. But actually, this is the right place to question ourselves why this happens and if what we are experiencing is actually made on purpose, because IA models are becoming more expensive each month, with features being ripped out of Perplexity, Claude, Gemini and ChatGPT. I think it is fair that Cursor plays transparently over us about that.
Yeah, that 1% hit for a simple folder delete is insane, but it perfectly highlights the issue.
When you use Auto for that, it’s not just running rm -rf. It’s doing a massive background scan reading your lockfiles, package.json, and half your directory just to “reason” through a basic install. You’re paying high-tier tokens for it to overthink a dumb task.
I highly doubt making a new account will save you. It’s not an algorithm targeting old users; it’s just the new Composer 2 being way too context-hungry compared to 6 months ago.
This is exactly why my $200 plan is bleeding out. We desperately need:
A “dumb terminal” mode that just executes without reading the whole codebase.
Real transparency on reading vs. reasoning tokens per action.
If we’re burning 1% on basic maintenance, the math just doesn’t scale for actual engineering work.
This already exists, CMD+K in the terminal. Removing node_modules and reinstalling dependencies is not a good task to ask the full-blown agent.
Still, I do agree with the core of complaints in this thread. I’m a daily Cursor user but wouldn’t consider myself a power user, I still do a lot of my own dev work and when I use AI, I give it targeted prompts for focused tasks. I used to never worry about usage and never hit my limit, but now it’s terrible. For the last two months I find myself hitting the limit by halfway through the month.
Hey @GIL101 Thanks for the detailed post and I’m happy to share some details that may help you. As you may know, the Ultra plan includes a very generous allocation of two pools of usage. The first is the API usage pool and the Ultra plan includes at least $400 a month of API usage. The second is the Auto + Composer pool. The $ amount of usage here is not publicly stated, but it is VERY generous and offers a lot of value. You can seriously hammer away at Auto and Composer usage for a very long time before you use up the included Ultra Auto and Composer usage.
The API pool usage can seem to be consumed faster, but that depends primarily on the models that you’re using and what you’re asking them to do. Opus 4.6 high-thinking is by far one of the most expensive model per token (reasoning tokens bill at output rates). Composer 2’s big numbers are mostly cache reads, which are heavily discounted — so the dashboard volume tends to look scarier than the actual spend.
Model providers have increased the cost of the latest frontier models significantly, so if you’re using Cursor Agents for the same type of requests but you’ve switched to the latest models (e.g. Opus 4.7, GPT 5.5), it may seem that you’re consuming your usage faster because these higher costs are eating your usage quicker.
We don’t offer a non-reasoning version of Composer, but it is designed to strike a great balance between high performance and cost efficiency. I recommend disabling fast if you want to be as cost-efficient as possible.
Appreciate the transparency, @kevinn knowing the API pool has a $400 floor makes the math feel a bit more grounded, but that dashboard is still a total jump scare for anyone building at scale.
The issue isn’t just using the frontier models; it’s the invisible overhead. Even with strict .cursorignore and fresh chat resets, the “reasoning tax” on Opus 4.6 and the aggressive cache-reading in Composer 2 are moving the goalposts. If the dashboard “looks scarier than the spend,” maybe we need a toggle to see actual credit depletion rather than just massive token counts that trigger panic.
@troehrkasse , I hear you on CMD+K for the small stuff, but when you’re deep in the flow orchestrating complex backend agents, you don’t want to break context to babysit the terminal. The whole point of the $200 “Ultra” tier should be to stay in that agentic flow without feeling like you’re walking on eggshells.
If the agent is smart enough to architect a system, it should be smart enough to know when not to “meditate” on a pnpm install.
I’ll keep hammering away and see where I land at the end of the month, but a “Low-Reasoning” toggle for Composer steps would be a massive win for those of us who need the “brain” for the plan but just want the “hands” for the execution.
Check out our new context window usage breakdown, which is intended to provide more clarity on what’s going into the prompts - this can help you manage your token usage too.
This new Context Window Breakdown is exactly the kind of transparency we’ve been asking for, @kevin , so thank you for pointing it out.
However, actually looking at the data perfectly illustrates the “invisible overhead” problem I am talking about. It proves exactly why my tokens are vanishing.
I pulled up my own context breakdown (see the second screenshot), and I am sitting at 166.3K tokens (83% of a 200K window) for a single request.
The glaring issue is the bucket distribution:
Skills: 92.2K tokens
Conversation: 61.9K tokens
Tools: 6.7K tokens
As a Data AND Solutions Engineer architecting autonomous multi-agent platforms, my “Skills” bucket has to be massive. I am forcing the agent to hold deep, complex rules about how Django, CrewAI, and Dify interact within my specific infrastructure. I can’t simply “tighten my setup” without completely lobotomizing the agent’s understanding of my architecture.
Here is where the math completely breaks the $200 plan: Because my required baseline context is over 160K tokens, every time auto enters an execution loop, it drags that massive payload with it. If it loops 5 times to troubleshoot a minor bug or verify a package installation, I am burning nearly 1 Million tokens in two minutes just on cache reads of my own rules and history.
This completely validates my previous point. If a power user’s environment requires 92K of “Skills” just to function correctly, we are being actively penalized every time we trigger an action.
We desperately need a way to tell the agent: “Just run this basic execution step, and do NOT re-read the 92K skills payload to do it.” Without a low-context or “dumb execution” override, the platform simply does not scale economically for complex engineering projects.
I’ve cleaned up my skills and yessss that was the issue. damnnn 91k every request. shouldnt the skills be better managed though and only call when needed or used. but yes this solve that issue so maybe i can code crazily and see the outcome
My guess is that you probably had a lot of model invokable skills, and some context was getting passed to each prompt. You can have your skills are non model invokable as follows:
---
name: mySkill
description: My skill does this:
disable-model-invocation: true
---
And then whenever you want to use the skill you just say /mySkill
So you still have access to your skills you just need to manually request them. This avoids them cluttering up your context and keeps your prompts light!
OMG me too. I had a bunch of skills that were set to allow the agent to use them intelligently, it seems like they were always being included in full even though they weren’t being used…
I didn’t see this mentioned so just adding something that helped me alot when i blew through my usage last month (via Auto).
My mental model was that Auto would intelligently choose the least expensive model for the task but that was not my experience.
Now, I use Composer (not fast) for execution of small tasks (like your node_models issue). Auto (for me) was pulling like.. Opus 4.7 and other VERY high token models for simple tasks. And sometimes falling back to Composer (fast) which is still 3x as expensive.
Note to toggle off ‘fast’ you have to click on the model selector, hover over Composer 2, then click the small ‘edit’ button there (it’s quite buried). But this iwll hopefully help you not run over your budget along with the Skills issue you mentioned above.
Personally I wish the default would be set to ‘Composer 2’ (not fast) because imo it’s suitable for like 90% of my (well defined) tasks and wayyyy cheaper.
Very glad to hear that the new context usage indicator is helpful!
Yes, my suggestion is go light on rules and model invokable skills and create them only when you have a consistent recurring need for the model to invoke them automatically. Anything that you can manually invoke via skills I would do that - personally I have a ton of user invokable skills that I use.
Auto selects models that balance intelligence, cost efficiency, and reliability. And availability as well (e.g. regional availability, model availability)
Auto is slightly more expensive than Composer 2 regular, but cheaper than Composer 2 fast.
Composer 2 is on Fast by default - announced in the Composer 2 launch post because we think it provides the best and most enjoyable experience for users.