Why does Cursor consume an absurd amount of cache read tokens?

Just to call out another example, I have a repo open where the tool definitions, system prompt, and other information (rules, skills I’ve defined) take up ~47.5 k of context. No files from my repo are included in this starter context.

I just sent “hi”, nothing else. But because I’d been working in other chats in the same repo, the provider’s cache already had most of that prefix, so it shows up as 47,499 cache read tokens and only 171 input tokens. The cache is doing exactly what it should: avoiding re-processing tokens the provider has already seen.

Imagine I submit this prompt:

read files and then decide the next file to look at. Do this 10 times, and make sure you think in between.

No surprise, huge cache read on this session, which took 13 requests and eventually opened a file with ~17k tokens (which was added to the next requests as cached tokens)

One factor that may contribute to the perception of higher cache token usage is that models and our agent harness have improved at sustained, multi-step work. A single message now often triggers 10+ LLM calls autonomously, rather than 3-4. The total work (and tokens) is similar to what multiple shorter turns would have consumed, just rolled up into one line item.