I have multiple single shot requests that are costing me $15 ~ $18 a request…
I understand I am using opus 4.6 using MAX but it used to not be more than $0.80 ~ $1.5 per request..now it’s 10x ?
Is this normal? A bug?
I have multiple single shot requests that are costing me $15 ~ $18 a request…
I understand I am using opus 4.6 using MAX but it used to not be more than $0.80 ~ $1.5 per request..now it’s 10x ?
Is this normal? A bug?
You were doing 10M token requests with opus for $1.5?
Can you hover over the larger requests and show the token breakdown?
My Auto requests are like 99% cache reads. I wonder what yours are.
Suppose the 10M tokens are:
9M cached input
0.5M new input
0.5M output
Cost:
Cached input → 9 × $0.50 = $4.50
New input → 0.5 × $5 = $2.50
Output → 0.5 × $25 = $12.50
Total ≈ $19.50
Example:
9.8M cached
0.1M new input
0.1M output
Cost:
Cached → 9.8 × $0.50 = $4.90
Input → 0.1 × $5 = $0.50
Output → 0.1 × $25 = $2.50
Total ≈ $7.90
Thanks this is useful. So are those prices (roughly) accurate how they calculate?
I’m not sure how many token, but the requests were essentially the same scale (obviously that doesn’t mean it didnt fetch/used much more input tokens).
All I meant was, the more i am paying this week, the more expensive the requests have gotten.
Hover over the tokens and it should show you the breakdown.
Yes those examples should roughly equate to how Cursor charges per request. This is from their website.
It would be worth comparing them to last weeks token usage and breakdown. Maybe the requests in general are using different amounts or ratios of tokens than normal.
This was the $18.40… i’m being charged for cache read or what?.. no way having 20,079 output should cost that much right?
Yea this doesn’t add up.
Should be like $8 at most.
Breakdown by % of cost
Cache read: ~89% ($6.94)
Output tokens: ~6% ($0.50)
Cache write: ~4% ($0.35)
Input tokens: ~0%
There has to be something else going on. Do these requests have to be “MAX”, maybe “MAX” is causing other charges that are not normal.
Max Mode uses token-based pricing at the model’s API rate, so it consumes usage faster than the default context window. On individual plans, a 20% upcharge is added to the model’s API rate.
Based on @y4my4my4m’s screenshot, Max Mode is enabled!
When using Max Mode, requests are twice as expensive when input exceeds 200k tokens, so I think the costs are largely expected here (if you double the manual calculations, you arrive at the number shown in the dashboard)!
Where is the pricing of Max Mode explained?
I don’t see much about it on the pricing page. Is it really as simple as 2x normal cost?
Just curious, what is to stop a generation from costing someothing like $250 ~ 350 in one request? Say if it can store cache in the millions. What are the limits?
If I leave it running for a long plan, many features, etc.
Can it just keep going up indefinitely?
You have to hover over the (!) icon on each model, it shows descriptions about legacy models and 200k+ token pricing. I think Claude’s own pricing page also uses the same pricing method. Though I feel like we need some kind of limiter or cost reminder per process, because the fact that it can cost millions of tokens in one go raises a question, does that mean it’s caching the entire codebase? Is the indexing basically caching all of it? And does caching here work the same as Claude’s , where the first request costs more, and how long does the cache stay warm before it expires?
PS. I dont know if im miss it in docs or other forum discussion for this questions answer,
Does this mean it wasn’t supposed to charge 2x above 200k?
CC: @Colin
It came yesterday and the question is for 3 days old requests.
I don’t know when this promotion went into affect, but your previous request may have been before the promotion. Just saying for now, maybe you will not be charged 2x, so go ham.