It looks like we’re currently experiencing a bug in Cursor that causes a spike in cache tokens.
Even if you send a request in a new chat, for some reason a large amount of cash tokens are consumed, resulting in an increase in consumption credits. We have confirmed that similar events have occurred with other users.
This event has probably occurred since February,
Steps to Reproduce
This is general token usage issue , even for new chat we cache read token seems in 980K , 340K etc , This happened from Feb only
Expected Behavior
This happened from last month .. when we do token analysis across the company for usage we can see a huge spike
Cache read tokens are generally expected. They mean previously cached context is being reused at a much lower cost than regular input tokens. For example, with Claude Opus 4.6, cache reads cost $0.50 per 1M tokens vs $5.00 per 1M tokens for regular input, so 924K cache read tokens cost roughly the same as about 92K input tokens. More details here: Someone please explain - Why are cache read and write chargeable?
That said, 924K cache read with 0 cache write in what you describe as a new chat is unusual. To investigate, could you share:
Which model are you using, and is Max mode enabled?
Do you have any project rules (.cursorrules or .cursor/rules) or any MCP servers configured?
A Request ID from one of these high cache read chats. You can get it from the chat context menu (top right) then Copy Request ID.
0 cache write suggests the context was already cached from a prior session. Prompt caching works across chats within a time window, so it’s likely hitting a prefix cache from previous requests. But the size of the cache hit on a simple request is worth looking into.
Hi Thank you so much for your quick reply. Like I mentioned earlier these issue is happening in our team account , generally we are not using rule now . And MCP is not yet setup . many people are using it in different context , but most of the cases we can see huge spike in cache read token .
One surprise is each new chats also using lot of tokens . please find one Request ID which I got from the team :: 552f2352-552a-4dc2-b059-bdc283acc78a
I’ll pass along the request ID so we can check this on the server side. But before that, I still need one detail: which model is the team mostly using, and is Max Mode enabled? This matters a lot because token usage varies a lot between models, for example “thinking” models use much more context.
Also for context: prompt caching works across chats for a short time window. So even a “new chat” can still hit a cached prefix from a recent session, which is why you can see cache reads even with 0 cache writes. That’s usually a good thing, cache reads are about 10x cheaper than normal input tokens. But 924K for what should be a simple “from scratch” request is still unusual if there are no rules or MCP isn’t set up.
I’ll share an update as soon as I have more details from investigating the request. In the meantime, if you can confirm the model and Max Mode status, it’ll help speed things up.
Hi Dean
Thank you so much for the reply.
In our company we won’t allow to use max mode or we wont allow developers to use MCP now . Then I just checked the team most of the people use grock or some people use auto mode , personally I used composer