We’ve been using Cursor for a while now, and recently something has started to feel off.
Our workload and the complexity of what we’re building haven’t really changed much, but our monthly cost has. We went from under $200 to over $400, and we’re not seeing a clear reason behind that jump.
It feels like something in the background is not as efficient as it could be. Maybe context isn’t being managed well, or there are too many back and forth calls happening just to complete a task with the LLM. We’ve had cases where a single prompt costs around $1, and sometimes Cursor gets stuck trying to figure something out and we end up stopping it. In those cases, we’re paying without getting any real output, which is tough to justify.
We genuinely like Cursor and want to keep using it, but the current cost trend is getting hard to ignore. We’ve started looking into other setups using similar models and tools, mainly to get better control over token usage and cost.
Hoping the team can take a closer look at this. If the costs can come back in line with the value we’re getting, we’d much rather stick with Cursor. Otherwise we need to start looking and I am sure a lot of users will start wondering about this too
Spend limits: If you’re on the Team plan, an admin can set spend limits per member via Settings > Usage. This helps control costs and avoid unexpected spikes.
Models: Max mode and premium models like Opus and GPT-5 use a lot more tokens. If you have them set as default, switching to Auto routing can cut costs a lot since the system will pick the best model for the task.
Context: If an agent gets stuck and you stop it, try starting a new chat instead of continuing a long session. Long chats build up context, and each next request gets more expensive.
Also check whether Cloud Agents or Automations are enabled. They’re billed separately and can be the source of the increase.
Let me know your plan and usage details. With actual numbers, it’ll be much easier to see what changed.
try see your history, today i was also shocked when 10 prompts cost me around 8-10$ with on-demand (wich i know kinda reasonable) but this is somehow concerning when we read data, somehow the router was cutting and reopen at ~50 - 60K token used, and it even change it into haiku, mind you this is one session from one prompt.
Same thing happened for me… All of these requests are from one prompt. Is this intentional or what is going on?` I’m speeding through my requests on my request based plan.
Here’s what’s happening. In Agent mode, each agent step like reading a file, calling a tool, running a command, or writing code is a separate API call, and each one is billed as a separate request. A single prompt can easily generate 20 to 30 of these steps. The token growth from line to line 149K to 169K for @IdarDev, 42K to 50K for @ZulfikarHD is context buildup, since each next step includes the full previous chat history.
@IdarDev, your case is a bit different. Based on the screenshot, you’re using claude-4.6-opus-max-thinking, which is the most expensive combo. Your context is already at 149K to 169K tokens, which is much higher than normal. If Max mode was turned on by accident, turn it off, it will cut usage a lot. Without Max mode, Opus uses way fewer thinking tokens.
A few tips for both of you:
Check if Max mode is on. If it is, turn it off for everyday work. Max only makes sense for very hard tasks.
Try Claude Sonnet or Auto without Max instead of Opus. For most tasks the quality difference is small, but the cost difference is big.
Start a new chat instead of continuing long sessions. The longer the chat, the more tokens each step uses.
I get what you’re saying. But this wasn’t the case a few days ago. That’s why I’m asking if it is intentional. Before: single prompt = a single request.
I’m not using Max mode. I have requests with millions of tokens of context just 2 days ago
I downgraded to 3.0.4 and started a new chat and it seems to be back to normal.
@mehdi, @ZulfikarHD, @IdarDev, can you please send the Request ID for the requests where you see the issue? This will help the team figure out what’s happening on the server side.
How to get it: in the chat, open the context menu in the top right corner and select Copy Request ID.
@IdarDev, also let me know which version you rolled back from to 3.0.4.
When gemini 3 was added to cursor first the flash version was priced reasonably and I was constantly using it. and even with combining it with auto I could keep the cost down.
There needs to be transparency from Cursor, has the prices been going up? or some change in the way they handle context and back and forth
I understand that the company needs to make money but this is starting to be really concerning how doing little work is costing this much tokens and money.
Same here. Even worse, I didn’t know how I was being charged because the Cursor team doesn’t let you know when you are. The team should be transparent when billing our credit cards.
One thing that I am noticing is that the “thinking” that is happening to get work done is now taking significantly more time and I am guessing this is one of the main places tokens are being burned.
This is 100% an issue / something changed. I’m on a legacy plan, and have 500 requests a month. What used to cost 1 request towards my limit one morning jumped to anywhere from 50 to 90 requests sometime around lunch. I’ve been using Cursor for about 3 months+ pretty constantly, so I have a very good pattern of my usage.
Hey @Chris_P, thanks for the report. A jump from 1 to 50 to 90 requests for a single prompt sounds exactly like what was discussed earlier in this thread with @IdarDev and @mehdi. Each agent step, like reading a file, calling a tool, or editing code counts as a separate billable request, and in 3.0.x this became more noticeable.
To dig into your specific case, can you share:
Your Cursor version in Help > About
The Request ID from one of the expensive prompts. In the chat, open the menu in the top right corner and click Copy Request ID
Which model you’re using and whether Max mode is enabled
@IdarDev mentioned that rolling back to 3.0.4 brought back normal behavior post #7. If you need to stabilize usage ASAP, that’s a temporary workaround.
We’re tracking this issue on our side. I’ll post an update in the thread as soon as I have one.
I don’t have an easy way to get the request id for one specific request for you. (The web ui for usage doesn’t show request ids, and the app prompt history doesn’t indicate how many requests were used… unless I’m missing something.)
I can note that I upgraded to 3.1.17 build of cursor, and I think it’s looking better.
the claude models I had been using are no longer all marked as “Max mode”, so it doesn’t force me in to max mode anymore to use those
I’ve made at least one prompt in the app, and see the expected 1 request consumed a couple times now
Hey @Chris_P, thanks for the update, this is a helpful signal.
Yeah, in 3.1.x we made changes that should’ve fixed part of the issue: Claude models are no longer automatically marked as Max, and a single prompt should count as 1 request again in typical cases. Sounds like that’s the effect you’re seeing.
If you notice a single prompt starts eating dozens of requests again, drop the Request ID and the version here and we’ll take a look. For now, I’m marking this in the thread as a partial resolution on 3.1.17.
This is same case for me, this is insane and non- managable, i built system within pro every month never exceeded, and this led me having one more account for my team-mate despite being moderate usage, i had 2 seats. but this month cycle, I even hardly used for 10 days and it said limit is over, I was suprised but in middle of release, so i paid 10$ thinking this would survive remaing 6 days in this month cycle, but it ended in 5-6 hours work work like few requests only. I again added 5 and say 1$ around per request. same project id, same user, less work then how come pricing is insanely going higher. specially when too many people are writing, I have built agents I know something must be at your end is messed up which is counting higher token, else with even less usage than usually pricing gets more than double not possible. it was always on Auto mode. I have sent you and email too, and expecting a resolution before moving to alternative, which we dont want to. have been using for last 8 months without any issue.
Hey @Akash_Srivastav. About billing for your account specifically (the limit got used up faster than usual, and your top-up was spent in 5 to 6 hours), the team at [email protected] is looking into it. They can check the account details. Since you already wrote them, you’re all set. Just wait for their reply.
In the meantime, to check the technical side here:
What Cursor version are you on in Help > About? In 3.1.x, part of the issue from this thread was fixed (see post #18). Claude models are no longer automatically marked as Max, and usually one prompt equals one request. If you’re on 3.0.x, updating could noticeably reduce usage.
Can you share the Request ID for one of the expensive prompts? In chat, open the top-right menu > Copy Request ID. With that, we can see what exactly happened on the server.
If it turns out your symptoms match what was discussed above, I’ll add your case to what we’re already tracking on our side.
I’m experiencing the same thing. My projects didn’t fundamentally change and suddenly I’m spending 3x+ per month and having to try very actively to conserve costs, plus implemented workflow changes to reduce token use and narrow auto-tests, and it still seems like costs are significantly higher than they were in ~March. I’m familiar with all of the general pricing dynamics of auto vs. max mode (which I never use) vs. the different model tiers, but I’ve been an active user for a long time and did not fundamentally change the way I’ve operated.
My qualitative interpretation is that each prompt is kicking off a lot more subagents or doing more context fetching and burning tokens at a higher rate without my explicitly asking for it, and I don’t know how to get this under control. I’m trying to lean on Auto more but the quality is notably worse and sometimes I can wrangle it at the cost of my own time but a lot of the time it simply can’t do what I need.