Hello! Thanks for your contribution. This is an excellent question that many Cursor users are experiencing. Let me explain why this happens:
Why Does Cursor Consume So Many Cache Read Tokens?
The high consumption of cache read tokens in Cursor is normal system behavior, although it may seem excessive. The 178,304 cache read tokens mentioned by the user are typical even for small changes in a single file.
How the Cache System Works
When you work in Cursor, the process works like this:
Cache Write: On the first request, Cursor sends all the context (files, system rules, prompts) to the AI provider, which processes and stores it in cache
Cache Read: On each subsequent interaction, the complete context is reused from cache, but is counted as cache read tokens
Accumulation: With each tool called and each follow-up response, the context grows and cache read tokens accumulate
Why the Numbers Are So High
Cache read tokens can represent 84-99% of total token usage. This happens because:
Cursor resends the entire complete conversation (including previous outputs) on each interaction
Each tool call and each follow-up request sends the complete chat history as input
The model needs to process all the context to generate responses, even though it’s cached
The Silver Lining: Cost Savings
Although the numbers seem alarming, cache read tokens are 10 times cheaper than normal input tokens:
Anthropic charges 10% of the input token price for cache reads, Gemini charges 25% of the input token price for cache reads. Without cache, you would pay 9-10 times more for the same tokens.
How to Reduce Cache Read Consumption
To optimize your token usage:
-Keep each chat focused on a single task
-Start a new chat for each new task (avoid long threads)
-Attach only necessary files to context
-Use simpler models for simple tasks
-Keep your rules short and focused
-Disable MCP tools you don’t need
It’s important to understand that, although the absolute numbers are high, this cache system is designed to reduce costs, not increase them. Without it, you would be paying full price for every token on every interaction.