Auto needs very careful prompts and very careful code review, and more time
Currently, the cursor ai bug seems to be a serious situation. Even simple commands are using the default 150,000 tokens at once. It wasn’t like this before. I look forward to making these errors official soon. Almost most users are appalled. Lose corporate trust.
I forgot the link, but as far as I remember the cache read doesn’t cost anything (CMIIW)
they are clearly doing very heavy iteration. weird.
@future Cache reads cost roughly 10% of input tokens. Depending on how much context was used and how long the chat was AI may need to read all the context for processing.
@future this could be a bug and should be fixed. Can you point me to a full bug report with Request Id about this in the forum so it can be checked in detail?
[Edit: as explained below total cache count does not mean something is wrong, its showing what all goes into a request]
Why their is even 107k token usage in single request, and sometime 10 million token usage in single request, those tokens can be used in whole month if good and wise prompt structure is in place, can cursor team explain how those tokens even happen?
hi again lol wsp, that 30x monthly is 1k request or 3k request or smth like that? some ppl like me do like 500-1000 request per day
As you screenshot shows there was:
- Input: short, a sentence.
- Output: changes to files and chat output
- Cache Write: context needed by AI e.g. file reads etc. cached to avoid re-processing
- Cache Read: context read from cache at (~10% of input token price and ~3% of cache write)
107k tokens is not the issue at all. As your screenshot clearly shows it would have cost $0.13 at API cost.
Suggestions for further optimization:
- Plan task beforehand and give more details to Agent to process in one go with less back and forth changes as it can handle several changes per edit.
@kristinaOA the total tokens are not counted equally as input/output/cache read and cache write have different cost per AI providers pricing. High cache read may signify there was a long thread and per cache write there were several good sized reads from files or any other tool.
[Edit: added “or any other tool.” & PS]
PS. Could you start a new chat with privacy disabled and post here the request ID. This would help Cursor Team to check inside what has been sent and see if there are any issues.
@beru no there is no inflated number. Tokens shown are as reported by AI providers.
Having grown alongside Cursor for some time, I’ve come to understand both my own and many users’ experiences and sentiments. But now I’m beginning to develop some negative feelings.
For any product, bugs are inevitable - developers can only strive toward stability (this applies to the more conscientious developers at least). When problems arise, developers do face them.
But is simply confronting and attempting to fix bugs and functional issues sufficient in the current situation?
I feel we’re being treated like lab rats or test subjects. We’re investing significantly more energy and money than normal circumstances would require, yet we receive no compensation for these financial losses - not even proper acknowledgment, it seems. As mentioned earlier, I believe many other users face similar issues: incorrect calculations or excessive consumption due to the tool’s inherent instability.
The difference lies in the fact that some users can identify these problems while many cannot. Does this mean the company can simply ignore financial losses suffered by those less technically capable users? Or only address issues for users who can identify problems themselves? If this is indeed the case, I’m not just disappointed with the team’s strategists - I would actively encourage everyone to legally defend their rights.
Nevertheless, I still hope both the team and all users can grow in a healthy, positive, diligent, and proactive manner.
Let’s hope there won’t be regrets.
@Kelen thank you for your comment. Cursor Team is working on keeping things stable and to fix issues that arise.
Cursor management has responded here in the forum and on social media about plan changes of last weeks and has acknowledged issues there.
If you see an incorrect calculation or excessive consumption please post a full bug report so this can be investigated. There is no inherent instability but some users may need help with details.
While there are bug reports here, it does not mean that those are wide spread or common or an strategy as you assumed.
How can so many users be wrong? There is something wrong with how the Agent consumes tokens.
Let’s say I have a small code change to make and I reference three 1000-line code files.
If I turn on Agent and ask it to make my implementation change, the amount of tool calls it makes and writes to caches, reads, cause the entire cost to bloat to $1-2. With Claude-4
If I turn on Manual mode and just ask the same chat and instead I tell explicitly to
- Write out the code blocks for it changes in snippets in the chat window
- Rewrite the ENTIRE code file with it’s code snippets integrated
- Let me copy-paste overwrite the file to apply it’s changes.
It costs like $0.10 - $0.20 with Claude 4.
Let’s be honest here, is the product working as intended? The bloated token cost that such a large user base is complaining about is caused by the Agentic mode. You can save all this cost by switching to Manual and asking it to output whole 1000+ line files in the chat window and let you copy-paste them overwrite your files to save 10-100x on cost.
The entire problem lays with apply_edit
or edit_tool
or code_edit
.
It doesn’t take much to replicate or investigate this
Like am I being crazy here?
Why is it 10x cheaper to ask the AI to rewrite your ENTIRE code file for you to copy-paste overwrite the original file and check the git diff, than it is for you to ask the AI to do apply the changes itself?
I’m Pro+ and I have $55/60 consumed and 2 weeks left before renewal. These days, I’m force to only use gemini-2.5-pro in Manual. Each code edit costs me $0.01 - $0.02 change because telling to write out the full file in chat window for me to copy-paste.
When I turn on the ‘AGENT’ mode, gemini begins costing some $0.20+ per change which I cannot do.
[MOD - T1000 - off topic, please use more relevant threads]
I see a few different unclear things which causes misunderstandings.
Could you post a request ID of such an interaction where you think Agent is going wrong, with privacy disabled for that chat so Cursor Team can investigate what happened and why it went wrong. Answering it in general terms is not sufficient as the details of that request aren’t clear.
Agent and Manual are different features and not just a switch in the Chat. Agent has actually more advanced processing and access to more tools which AI needs to understand and use.
It’s totally fine to use Manual mode as well. That’s why its there.
[MOD - T1000 - off topic, please use more relevant threads]
Sorry for the incoming rant, but now I’m getting frustrated not only with the tool, but the “support” too!
this should be very easy for Cursor Team to reproduce!
Many of us have worked on large projects and all are well aware, that when large user group starts talking about the problem,- the very first thing the team should be doing is trying to reproduce the problem instead of asking users to supply the issue details!
If you can’t reproduce the problem,- then you go to the users and ask for more details!
So far I’ve seen multiple posts with details given, including mode and LLMs… that should be more than enough to start investigating.
Hi @kristinaOA and thanks for the feedback. I asked for the request ID as it’s not as clear what is causing it for the user. If you read my other comments in this thread there is additional information to consider about token count and it may not be a bug but rather a misunderstanding.
As a dev with 25+ years of experience I am considering users input and would not ask if the issue were so ‘easy’ to reproduce. I have not seen any widespread issue but happy to check if you tag me in the reports with @t1000
Here my more comprehensive post about token usage
Instead of sent same code again and again to model as done in our chat applications for example, and as 10+ years back end developer & cto of a software company, here is my simple structure suggested:
- use very small proxy LLM to descripe code in matter of function name and function returns
- pass this to claude in smallar format
- summerize clude return in next question
- when searching and collecting code from folders use same small LLM
- save claude for real usage
- give developers good price to stay as they are not millionaires
And if you need any help I’m volunteering