Cache read token

Where does the bug appear (feature/product)?

Cursor IDE

Describe the Bug

It looks like we’re currently experiencing a bug in Cursor that causes a spike in cache tokens.
Even if you send a request in a new chat, for some reason a large amount of cash tokens are consumed, resulting in an increase in consumption credits. We have confirmed that similar events have occurred with other users.
This event has probably occurred since February,

Steps to Reproduce

This is general token usage issue , even for new chat we cache read token seems in 980K , 340K etc , This happened from Feb only

Expected Behavior

This happened from last month .. when we do token analysis across the company for usage we can see a huge spike

Screenshots / Screen Recordings

Operating System

MacOS

Version Information

Version: 2.6.12 (Universal)
VSCode Version: 1.105.1
Commit: 1917e900a0c4b0111dc7975777cfff60853059d0
Date: 2026-03-04T21:41:18.914Z
Build Type: Stable
Release Track: Default
Electron: 39.6.0
Chromium: 142.0.7444.265
Node.js: 22.22.0
V8: 14.2.231.22-electron.0
OS: Darwin arm64 24.4.0

Additional Information

As we are using across the company this is really affecting , the planed whole company implementation

Does this stop you from using Cursor

Sometimes - I can sometimes use Cursor

Hey, thanks for the report.

Cache read tokens are generally expected. They mean previously cached context is being reused at a much lower cost than regular input tokens. For example, with Claude Opus 4.6, cache reads cost $0.50 per 1M tokens vs $5.00 per 1M tokens for regular input, so 924K cache read tokens cost roughly the same as about 92K input tokens. More details here: Someone please explain - Why are cache read and write chargeable?

That said, 924K cache read with 0 cache write in what you describe as a new chat is unusual. To investigate, could you share:

  • Which model are you using, and is Max mode enabled?
  • Do you have any project rules (.cursorrules or .cursor/rules) or any MCP servers configured?
  • A Request ID from one of these high cache read chats. You can get it from the chat context menu (top right) then Copy Request ID.

0 cache write suggests the context was already cached from a prior session. Prompt caching works across chats within a time window, so it’s likely hitting a prefix cache from previous requests. But the size of the cache hit on a simple request is worth looking into.

Let me know and we’ll dig deeper.

Hi Thank you so much for your quick reply. Like I mentioned earlier these issue is happening in our team account , generally we are not using rule now . And MCP is not yet setup . many people are using it in different context , but most of the cases we can see huge spike in cache read token .

One surprise is each new chats also using lot of tokens . please find one Request ID which I got from the team :: 552f2352-552a-4dc2-b059-bdc283acc78a

We noticed this from Feb mid only

Thanks for the extra info.

I’ll pass along the request ID so we can check this on the server side. But before that, I still need one detail: which model is the team mostly using, and is Max Mode enabled? This matters a lot because token usage varies a lot between models, for example “thinking” models use much more context.

Also for context: prompt caching works across chats for a short time window. So even a “new chat” can still hit a cached prefix from a recent session, which is why you can see cache reads even with 0 cache writes. That’s usually a good thing, cache reads are about 10x cheaper than normal input tokens. But 924K for what should be a simple “from scratch” request is still unusual if there are no rules or MCP isn’t set up.

I’ll share an update as soon as I have more details from investigating the request. In the meantime, if you can confirm the model and Max Mode status, it’ll help speed things up.

Hi Dean
Thank you so much for the reply.
In our company we won’t allow to use max mode or we wont allow developers to use MCP now . Then I just checked the team most of the people use grock or some people use auto mode , personally I used composer

Hope this will help you

This topic was automatically closed 22 days after the last reply. New replies are no longer allowed.