Cache Read Bug using 90+ Million Tokens (120x more than it should)

Illuminationx · July 22, 2025, 2:20pm

Yes its conjecture, but I’ve been wondering the same thing. I always wondered how they paid for all the free user tiers as well. Pricing for Amazon Kiro.dev have been floating around which is Pro ($19/mo for 1k agent interactions), Pro+ ($39/mo for 3k). Not sure if its correct, but that would be another company offering Claude 4 Sonnet at the prices and better than the usage rates we are used to.

If Cursor isn’t going to properly fix Pro and Pro+ accounts that used to last all month that are draining in 3 and 7 days respectively. I see a mass exodus, very few people are going to be able to pay thousands of dollars a month to use Cursor.

This is incredibly unfortunate, if I had to choose an IDE it would be Cursor and i would pay the $200/mo Ultra subscription all year long to use Cursor with the same usage rates as before that would last all month.

Illuminationx · July 22, 2025, 2:32pm

@danperks @Zackh1998

The Cache Read bug is getting worse, since the 2 Billion Token screenshot above. People are reporting that even when there is 0 cache write, the cache read was still 43 million tokens!! What cache is being read, if there was no cache write?

Direct Link to Post: Frustrated with Cursor’s Sudden Token Drain and Access Restrictions - #187 by ihate_Dave

CK-iRonin.IT · July 22, 2025, 2:35pm

FYI, I added Anthropic API key and kept using Clade 4 Sonnet MAX Thinking and my cost is way-way down (normal-like).

CK-iRonin.IT · July 22, 2025, 2:43pm

This is from Anthropic’s console.
Pure Cursor’s usage is the last hour I think, if both hours then it’s even crazier difference because it all costed me $7.97

Illuminationx · July 22, 2025, 4:06pm

@danperks @Zackh1998 We are still waiting on your answers to these 3 simple questions. Please provide clarity to get to the bottom of this:

What technical changes have caused this 10x increase in token consumption? Users are performing the same development work on the same projects with identical context.
Why is there such a significant discrepancy between advertised request limits and actual usage?

Pro+ advertises 675 requests → I received 60-70 when it used to last a month with left over tokens
Pro advertises 225 requests → Users report depletion in 2-3 days who used to get a months usage

Can you provide an option to revert to the previous settings that didn’t cause excessive cache read consumption & 10x drain?

Technically something is causing this 10x increase of token usage for the same work on the same dev project with the same context. Can’t you offer the options to turn back on the previous settings that didn’t cause a 10x cache read drain? I really hope this isn’t a bait and switch to get users to pay full API fees, rather than usage rates that matched the amount of requests our subscriptions provide listed at Cursor – Models & Pricing .

Expected usage within limits

Expected usage within limits for the median user per month:

Pro: ~225 Sonnet 4 requests, ~550 Gemini requests, or ~650 GPT 4.1 requests
Pro+: ~675 Sonnet 4 requests, ~1,650 Gemini requests, or ~1,950 GPT 4.1 requests
Ultra: ~4,500 Sonnet 4 requests, ~11,000 Gemini requests, or ~13,000 GPT 4.1 requests

CK-iRonin.IT · July 22, 2025, 5:05pm

Cursor is down for me so here are new stats from using API key:

I’ve just sent a request for investigation and refund of my use in May, June and July…

In avg I paid ca. $0.73 / mtok via API key vs like $10-15 / mtok directly via Cursor

4rt3mi5 · July 22, 2025, 6:33pm

I think Zackh1998’s post perfectly corroborates my previous findings. As he said:

But the system prompt encourages the model to call tools to understand all the details, encourages the model to explore multiple implementation methods and multiple search keywords, and forces the model to use different keywords for multiple searches, requires the model to break down the task into subqueries multiple times, and requires the model to use tools as much as possible. If it’s a complex project, such prompts will inevitably lead to an explosive growth in token usage. As he said:

But I think the situation might be even worse. In a slightly more complex project, to understand every detail of a particular file, it might even be necessary to read half of the project’s files. This could cause the initial 20,000 tokens of context added by the user to expand to an incomprehensible size, far more than just 30,000. Then, because the system encourages or even forces the model to make more tool calls, this massive context is repeatedly included, ultimately leading to enormous token consumption.

Ps: Zackh1998’s post merely explains that the ratio of input tokens to cached tokens is normal (1:9), but avoids the most important question: why such massive token usage occurred. According to his post, cached reads can actually be understood as inexpensive input tokens. Excluding the price factor, cached reads are essentially input. That is, for the friend who consumed 2 billion cached read tokens, the actual input for this request is over 2 billion plus 6.6 million, while only outputting 10 million tokens. The input-output ratio reaches 200:1, which is far too absurd.

4rt3mi5 · July 22, 2025, 7:07pm

Additionally, I have another question. Someone mentioned in a post that their pro plan ran out of tokens after using $86 worth. I remember a developer once replied to a friend in the forum saying that he had used more than $150 worth of tokens. But why did my Pro plan run out after spending $64. I want to know what caused such a significant discrepancy. Let’s set aside whether the pricing is reasonable for now; at least every user should be treated fairly, right?

liquefy · July 22, 2025, 9:06pm

update from the team on the token consumption is in other thread:

CK-iRonin.IT · July 22, 2025, 9:52pm

For Claude 4, The reported a total spend is $1019.88, but the token-based calculation gives $850.98. The difference is $168.90 - almost 20% off.

Gemini charged me for cache READs while doing ZERO cache WRITEs…

liquefy · July 22, 2025, 9:54pm

cursor charges 20% fee, not sure if its in the calculations, my guess is yes

condor · July 22, 2025, 10:42pm

@CK-iRonin.IT I can provide more info:

Token price in Cursor is 1.2 x AI providers API pricing.
Gemini does not separate cache writes, therefore input includes cache write. But Gemini charges more for cache reads (25% instead of Anthropic’s 10%)

Daniil · July 23, 2025, 10:15am

Yes my Cache read is also 14x higher than it should be

Green_Eggs · July 23, 2025, 11:54am

Cursor is a hot mess at the moment, hope they sort out their pricing and token overage problems.
Their investors must be mighty upset at the team for blowing this

JAVaFAPT2 · July 23, 2025, 12:16pm

Current billing period: Jul 1, 2025 - Aug 1, 2025

Pro comes with at least $20 of included usage per month. We work closely with the model providers to make this monthly allotment as high as possible. You’ll be notified in-app when you’re nearing your monthly limit.

Model	Input (w/ Cache Write)	Input (w/o Cache Write)	Cache Read	Output	Total Tokens	API Cost	Cost to You
auto	806,793	26,568,273	126,073,173	735,109	154,183,348	$119.33	$0
claude-4-sonnet-thinking	9,107,286	1,756,097	110,856,664	1,200,555	122,920,602	$93.15	$0
o3	0	14,897,855	39,395,088	313,239	54,606,182	$61.11	$0
claude-4-opus-thinking	504,696	58,345	7,634,189	118,350	8,315,580	$36.68	$0
o3-pro	0	496,452	0	1,318	497,770	$12.04	$0
gemini-2.5-pro-preview-06-05	0	1,755,290	18,796,628	81,669	20,633,587	$10.11	$0
gemini-2.5-pro-preview-05-06	0	525,830	3,387,657	15,895	3,929,382	$2.05	$0
claude-4-sonnet	261,864	49,976	1,544,270	11,789	1,867,899	$1.85	$0
claude-3.5-haiku	439,611	107,403	3,655,389	37,914	4,240,317	$1.21	$0
gemini-2.5-pro	0	27,524	183,699	3,254	214,477	$0.15	$0
o4-mini	0	71,725	74,216	2,388	148,329	$0.11	$0
Total	46,314,770	2,521,480	11,120,250	311,600,973	371,557,473	$337.79	$0
is this bit too much ?

proteus-dev · July 23, 2025, 12:22pm

any news @danperks ?

Since this is a pivotal pricing issue reported by many users, I trust it will TRUTHFULLY be treated as a priority, right?

Illuminationx · July 24, 2025, 12:23am

I’ve been waiting 2 days for answers to my basic questions about this. They have answered other questions and skipped over it for me so far. Hoping they address it.

danperks · July 24, 2025, 10:10am

Hey, you can see my response to the high cache read query here:

To summarise, your conversation is re-read by the LLM on every tool call - this is the same with all tools that have the same agentic workflow, not just Cursor - which accounts for the high read token throughput.

CK-iRonin.IT · July 24, 2025, 12:43pm

@danperks could you please explain why I was charged $109.7 for this prompt when the token pricing gived $4.68?

Token Counts and Pricing

Category	Token Count	Rate (/1M tokens)	Formula	Cost (USD)
Input	148,145	$3.00	(148,145 / 1,000,000) × 3.00	$0.44
Output	25,587	$15.00	(25,587 / 1,000,000) × 15.00	$0.38
Cache write	304,959	$3.75	(304,959 / 1,000,000) × 3.75	$1.14
Cache read	9,065,721	$0.30	(9,065,721 / 1,000,000) × 0.30	$2.72

Total Cost

$0.44 + $0.38 + $1.14 + $2.72 = $4.68

Illuminationx · July 24, 2025, 1:18pm

Hi Dan,

Thanks for the response, but it still doesn’t answer my main questions. There are hundreds of people actively expressing this same issue, and likely tens of thousands + who are experiencing it.

Why is there such a significant discrepancy between advertised request limits and actual usage, specifically clients whose accounts lasted all month and now last a fraction of that with same usage on same projects? These are direct before/after changes of 10x token drain. Why did the previous version of cursor not do that, and why can’t we have the option

Pro+ advertises 675 requests → I received 60-70 before running out of credits in 7 days on a pro+ that lasted me all month with left over credits on the same project/tasks. I reran previous tasks, with backups of my code base that had correct usage rates before that are now showing the 10x drain.
Pro advertises 225 requests → Users report depletion in 2-3 days who used to get a months usage

Can you provide an option to revert to the previous settings that didn’t cause excessive cache read consumption & 10x drain? Michael Truell said legacy users could opt to switch back to 500 requests a month on June 16th 2025 (Link )

Technically something is causing this 10x increase of token usage for the same work on the same dev project with the same context. People who used to get 100% usage rates are getting 10% of what they were getting compared to the provided median request usage expectations listed at Cursor – Models & Pricing .

" ### Expected usage within limits

Expected usage within limits for the median user per month:

Pro: ~225 Sonnet 4 requests, ~550 Gemini requests, or ~650 GPT 4.1 requests
Pro+: ~675 Sonnet 4 requests, ~1,650 Gemini requests, or ~1,950 GPT 4.1 requests
Ultra: ~4,500 Sonnet 4 requests, ~11,000 Gemini requests, or ~13,000 GPT 4.1 requests "

Thanks in advance

Topic		Replies	Views
How to disable Cache Write and Cache Read? Discussions	48	402	July 23, 2025
For those freaking out - it has to be a bug! Discussions	35	3879	July 22, 2025
Extreme token usage Discussions	68	1737	July 18, 2025
Frustrated with Cursor’s Token Hungriness Discussions	209	11983	July 25, 2025
Why is a simple edit eating 100,000+ tokens? Let’s talk about this Discussions	70	5179	July 25, 2025

Cache Read Bug using 90+ Million Tokens (120x more than it should)

Expected usage within limits

Token Counts and Pricing

Total Cost

Related topics