Cache Read Bug using 90+ Million Tokens (120x more than it should)

Yes its conjecture, but I’ve been wondering the same thing. I always wondered how they paid for all the free user tiers as well. Pricing for Amazon Kiro.dev have been floating around which is Pro ($19/mo for 1k agent interactions), Pro+ ($39/mo for 3k). Not sure if its correct, but that would be another company offering Claude 4 Sonnet at the prices and better than the usage rates we are used to.

If Cursor isn’t going to properly fix Pro and Pro+ accounts that used to last all month that are draining in 3 and 7 days respectively. I see a mass exodus, very few people are going to be able to pay thousands of dollars a month to use Cursor.

This is incredibly unfortunate, if I had to choose an IDE it would be Cursor and i would pay the $200/mo Ultra subscription all year long to use Cursor with the same usage rates as before that would last all month.

@danperks @Zackh1998

The Cache Read bug is getting worse, since the 2 Billion Token screenshot above. People are reporting that even when there is 0 cache write, the cache read was still 43 million tokens!! What cache is being read, if there was no cache write?

Direct Link to Post: Frustrated with Cursor’s Sudden Token Drain and Access Restrictions - #187 by ihate_Dave

2 Likes

FYI, I added Anthropic API key and kept using Clade 4 Sonnet MAX Thinking and my cost is way-way down (normal-like).

This is from Anthropic’s console.
Pure Cursor’s usage is the last hour I think, if both hours then it’s even crazier difference because it all costed me $7.97 :police_car_light:

@danperks @Zackh1998 We are still waiting on your answers to these 3 simple questions. Please provide clarity to get to the bottom of this:

  1. What technical changes have caused this 10x increase in token consumption? Users are performing the same development work on the same projects with identical context.
  2. Why is there such a significant discrepancy between advertised request limits and actual usage?
  • Pro+ advertises 675 requests → I received 60-70 when it used to last a month with left over tokens
  • Pro advertises 225 requests → Users report depletion in 2-3 days who used to get a months usage
  1. Can you provide an option to revert to the previous settings that didn’t cause excessive cache read consumption & 10x drain?

Technically something is causing this 10x increase of token usage for the same work on the same dev project with the same context. Can’t you offer the options to turn back on the previous settings that didn’t cause a 10x cache read drain? I really hope this isn’t a bait and switch to get users to pay full API fees, rather than usage rates that matched the amount of requests our subscriptions provide listed at Cursor – Models & Pricing .

Expected usage within limits

Expected usage within limits for the median user per month:

  • Pro: ~225 Sonnet 4 requests, ~550 Gemini requests, or ~650 GPT 4.1 requests
  • Pro+: ~675 Sonnet 4 requests, ~1,650 Gemini requests, or ~1,950 GPT 4.1 requests
  • Ultra: ~4,500 Sonnet 4 requests, ~11,000 Gemini requests, or ~13,000 GPT 4.1 requests

Cursor is down for me so here are new stats from using API key:



I’ve just sent a request for investigation and refund of my use in May, June and July…

In avg I paid ca. $0.73 / mtok via API key vs like $10-15 / mtok directly via Cursor :police_car_light:

1 Like

I think Zackh1998’s post perfectly corroborates my previous findings. As he said:

But the system prompt encourages the model to call tools to understand all the details, encourages the model to explore multiple implementation methods and multiple search keywords, and forces the model to use different keywords for multiple searches, requires the model to break down the task into subqueries multiple times, and requires the model to use tools as much as possible. If it’s a complex project, such prompts will inevitably lead to an explosive growth in token usage. As he said:

But I think the situation might be even worse. In a slightly more complex project, to understand every detail of a particular file, it might even be necessary to read half of the project’s files. This could cause the initial 20,000 tokens of context added by the user to expand to an incomprehensible size, far more than just 30,000. Then, because the system encourages or even forces the model to make more tool calls, this massive context is repeatedly included, ultimately leading to enormous token consumption.

Ps: Zackh1998’s post merely explains that the ratio of input tokens to cached tokens is normal (1:9), but avoids the most important question: why such massive token usage occurred. According to his post, cached reads can actually be understood as inexpensive input tokens. Excluding the price factor, cached reads are essentially input. That is, for the friend who consumed 2 billion cached read tokens, the actual input for this request is over 2 billion plus 6.6 million, while only outputting 10 million tokens. The input-output ratio reaches 200:1, which is far too absurd.

Additionally, I have another question. Someone mentioned in a post that their pro plan ran out of tokens after using $86 worth. I remember a developer once replied to a friend in the forum saying that he had used more than $150 worth of tokens. But why did my Pro plan run out after spending $64. I want to know what caused such a significant discrepancy. Let’s set aside whether the pricing is reasonable for now; at least every user should be treated fairly, right?

update from the team on the token consumption is in other thread:

1 Like

For Claude 4, The reported a total spend is $1019.88, but the token-based calculation gives $850.98. The difference is $168.90 - almost 20% off.

Gemini charged me for cache READs while doing ZERO cache WRITEs…

cursor charges 20% fee, not sure if its in the calculations, my guess is yes

@CK-iRonin.IT I can provide more info:

  • Token price in Cursor is 1.2 x AI providers API pricing.
  • Gemini does not separate cache writes, therefore input includes cache write. But Gemini charges more for cache reads (25% instead of Anthropic’s 10%)

Yes my Cache read is also 14x higher than it should be

Screenshot 2025-07-23 at 11.14.47

Cursor is a hot mess at the moment, hope they sort out their pricing and token overage problems.
Their investors must be mighty upset at the team for blowing this

Current billing period: Jul 1, 2025 - Aug 1, 2025

Pro comes with at least $20 of included usage per month. We work closely with the model providers to make this monthly allotment as high as possible. You’ll be notified in-app when you’re nearing your monthly limit.

Model Input (w/ Cache Write) Input (w/o Cache Write) Cache Read Output Total Tokens API Cost Cost to You
auto 806,793 26,568,273 126,073,173 735,109 154,183,348 $119.33 $0
claude-4-sonnet-thinking 9,107,286 1,756,097 110,856,664 1,200,555 122,920,602 $93.15 $0
o3 0 14,897,855 39,395,088 313,239 54,606,182 $61.11 $0
claude-4-opus-thinking 504,696 58,345 7,634,189 118,350 8,315,580 $36.68 $0
o3-pro 0 496,452 0 1,318 497,770 $12.04 $0
gemini-2.5-pro-preview-06-05 0 1,755,290 18,796,628 81,669 20,633,587 $10.11 $0
gemini-2.5-pro-preview-05-06 0 525,830 3,387,657 15,895 3,929,382 $2.05 $0
claude-4-sonnet 261,864 49,976 1,544,270 11,789 1,867,899 $1.85 $0
claude-3.5-haiku 439,611 107,403 3,655,389 37,914 4,240,317 $1.21 $0
gemini-2.5-pro 0 27,524 183,699 3,254 214,477 $0.15 $0
o4-mini 0 71,725 74,216 2,388 148,329 $0.11 $0
Total 46,314,770 2,521,480 11,120,250 311,600,973 371,557,473 $337.79 $0
is this bit too much ?

any news @danperks ?

Since this is a pivotal pricing issue reported by many users, I trust it will TRUTHFULLY be treated as a priority, right?

I’ve been waiting 2 days for answers to my basic questions about this. They have answered other questions and skipped over it for me so far. Hoping they address it.

1 Like

Hey, you can see my response to the high cache read query here:

To summarise, your conversation is re-read by the LLM on every tool call - this is the same with all tools that have the same agentic workflow, not just Cursor - which accounts for the high read token throughput.

1 Like

@danperks could you please explain why I was charged $109.7 for this prompt when the token pricing gived $4.68?

image

:1234: Token Counts and Pricing

Category Token Count Rate (/1M tokens) Formula Cost (USD)
Input 148,145 $3.00 (148,145 / 1,000,000) × 3.00 $0.44
Output 25,587 $15.00 (25,587 / 1,000,000) × 15.00 $0.38
Cache write 304,959 $3.75 (304,959 / 1,000,000) × 3.75 $1.14
Cache read 9,065,721 $0.30 (9,065,721 / 1,000,000) × 0.30 $2.72

:money_bag: Total Cost

$0.44 + $0.38 + $1.14 + $2.72 = $4.68

1 Like

Hi Dan,

Thanks for the response, but it still doesn’t answer my main questions. There are hundreds of people actively expressing this same issue, and likely tens of thousands + who are experiencing it.

  1. Why is there such a significant discrepancy between advertised request limits and actual usage, specifically clients whose accounts lasted all month and now last a fraction of that with same usage on same projects? These are direct before/after changes of 10x token drain. Why did the previous version of cursor not do that, and why can’t we have the option
  • Pro+ advertises 675 requests → I received 60-70 before running out of credits in 7 days on a pro+ that lasted me all month with left over credits on the same project/tasks. I reran previous tasks, with backups of my code base that had correct usage rates before that are now showing the 10x drain.
  • Pro advertises 225 requests → Users report depletion in 2-3 days who used to get a months usage
  1. Can you provide an option to revert to the previous settings that didn’t cause excessive cache read consumption & 10x drain? Michael Truell said legacy users could opt to switch back to 500 requests a month on June 16th 2025 (Link )

Technically something is causing this 10x increase of token usage for the same work on the same dev project with the same context. People who used to get 100% usage rates are getting 10% of what they were getting compared to the provided median request usage expectations listed at Cursor – Models & Pricing .

" ### Expected usage within limits

Expected usage within limits for the median user per month:

  • Pro: ~225 Sonnet 4 requests, ~550 Gemini requests, or ~650 GPT 4.1 requests
  • Pro+: ~675 Sonnet 4 requests, ~1,650 Gemini requests, or ~1,950 GPT 4.1 requests
  • Ultra: ~4,500 Sonnet 4 requests, ~11,000 Gemini requests, or ~13,000 GPT 4.1 requests "

Thanks in advance