For the new max mode pricing as shown below what exactly does this mean in terms of pricing? It looks like it means it just uses requests rather than pay per use but that raises a few immediate questions.
What happens when a user runs out of fast requests does max mode stop working? Does it become usage based pricing? Does it just use slow requests? Is there a way to opt into usage based pricing earlier to avoid burning through fast requests in 20-50 max requests?
I understand this pricing model is meant to be simpler but I feel there is information lacking on implementation details. I haven’t updated to the new version yet as I want to avoid accidentally burning through all of my fast credits immediately.
Any clarification would be greatly appreciated!
P.S. I know the usage based pricing is cost+20% but if the actual price per model could be listed somewhere under usage based pricing that’d be very helpful.
It does not prevent you from running out of requests. You’ll still burn through your requests in 20-50 max prompts. It allows you to pay for additional requests when you run out instead of switching to free, slow requests. It doesn’t say anything about enabling it before you run out of requests(just that you have to enable it when you run out to keep using fast requests), but you can turn it on or off at any time.
The new pricing method is quite expensive in terms of request usage. I lost hundreds of requests for very minimal coding effort — something that wasn’t an issue with previous versions.
Considering this, it’s becoming unaffordable, and competitors are gaining an edge because of it.
I hope Cursor will reconsider its approach to usage-based costing.
Agreed, with non-MAX mode the pricing is very predictable and with the right prompts you can do a lot of coding with 500 requests.
For MAX Mode the pricing depends on tokens, so sending too much context unnecessarily would also consume tokens quickly. Some models have caching tokens (cheaper than first submission) for prompts they already received when you continue in chat.
Normal mode: Fixed cost per message, super predictable
MAX mode: Token-based pricing, can get expensive if sending lots of context
The token caching in MAX mode is pretty neat - when the model sees the same content again (like in follow-up messages), it only charges about 10% of the original token cost. But yeah, if you’re working with a tight request budget, sticking to normal mode is your best bet
- What happens when a user runs out of fast requests does max mode stop working? Does it become usage based pricing? Does it just use slow requests?
If a user uses all of their fast requests, they will no longer have access to MAX unless they enable usage-based pricing. Their is no slow pool for MAX mode requests. - Is there a way to opt into usage based pricing earlier to avoid burning through fast requests in 20-50 max requests?
Unfortunately not, however, your end-of-month bill would be the same hypothetically.
Obviously staying out of MAX mode is going to make costs predictable, but that does not address the OP’s (or my) question – for those times when MAX mode is necessary, how can we understand the costs of the various models?
Also I see the only MAX model is sonnet - gemini went away with the last update?
Not sure if that helps but once you use it, you can see in the new dashboard, how many requests/tokens each actual model call consumed that was triggered from a single max mode prompt. It lists many lines with each stating costs of e.g. 0.2 requests or 2.4 and when hovering you see the tokens.
During usage of max mode, I’ve had some cases where it would go into 30 requests or so and from what I’ve seen I suspect that the amount of tool calls also play a role because cursor internally then re-requests the model (with hopefully many cached tokens) after every single one like list me the files in that directory, show me installed packages et cetera.
I think over time we will figure out prompts that will try to avoid the most expensive usages (unless we are willing to pay for it).
Just a follow-up question I have on the matter. For anyone that is just regularly using any of the max models, whether you are paying out of pocket or your employer is paying it, how much do you generally rack up in spending per day or per month?
I have had a nice experience with rare calls to the max models with the previous pricing model, but I am not sure how quickly the expense builds up primarily using a max model with the price changes. Especially since you can’t opt into usage-based pricing for a max model specifically since it now burns through fast requests first. I’d imagine it would force you into usage-based pricing even for regular models certain times of day if the slow queue is bad.
Anyway, if anyone is willing to share their experience with heavy max usage, I’d love to hear your thoughts!
It would be a game changer to have a little counter which shows how many tokens are being sent in each request (out of the max) and how much the request costed after it finished.