MAX MODE is too expensive!

I only @mentioned two files, but Agent Mode did a lot of things (many of them unnecessary :downcast_face_with_sweat:).

I have no idea how MAX Mode is currently calculating usage. If it keeps calculating like this, MAX Mode will basically be unusable within Agent.

Most likely the prompt was incorrect or too wide and the rules aren’t rigid/clear enough.

1 Like

it’s expensive but it also works great so please do not ■■■■ it up if you guys “fix it”

2 Likes

This is a very expensive billing

Cursor – Models & Pricing


Is it just me, or is the new pricing model very expensive?
Just two max mode prompts finished more than 25% of my total requests. And in the end, it didn’t do the job, so I needed to revert.
This is very disappointing.

I don’t know why, but a few days ago, it seemed that MAX mode was more in control, I would say.

MAX mode is using cached tokens. If the cache is empty, it will send lots of tokens, the rules, included files, system prompts and whatever else it thinks it should read, this is easily 60k-120k in tokens just on warmup.

If you keep the discussion concise, it will consume less tokens because it has all the data already. So first time it will burn a few dolars, then each tool use or update will be a few cents.

Warning: IF YOU DO NOT use the session, the CACHE WILL EXPIRE. This is somewhere in the range of tens of minutes or an hour. If the cache is emptied, refilling it is going to cost again.

Normal mode queries will use half the cost of a MAX tool call, for the whole reply including all tool calls in normal mode.

1 Like

I decided to try o3 for a problem I’ve been trying to solve for two hours with other models. I burned 40 requests, 0.89$, and the problem still was not solved. But I learned pain T_T

After trying to solve the same problem with Gemini MAX (June) and Claude MAX (Sonnet 4), not only is Claude using more queries, but:

  • Claude has more verbose code
  • Claude has more descriptive comments
  • Claude separates workflow operations more granular
  • Claude thinks many many more times for each punctual code change
  • Claude groups less change into a single query
  • Claude runs build/debug/dry-test/live-test commands unprompted
  • Claude produces a detailed project management report at the end
  • Claude updates README, CHANGELOG, FEATURES and explains changes there
  • Claude takes twice as long to finish
  • Claude appears to read more times from files even with bigger cache
  • Claude uses 9 :red_exclamation_mark: times more queries
  • Claude uses 4 :red_exclamation_mark: times more cache each time
  • Claude cost 12 :red_exclamation_mark: times more money
  • Code does the same end result

I understand that this behavior could be caused by my rules and existing code structure (Claude did 99% of it anyway), and there is a pressure point where the model is caught between highly tethered behaviors and is neurotic in satisfying all criteria, but I still think the team has not optimized this thoroughly. It may be that a month is enough to expand an LLM such that it eats all the context and speaks once in completion and correctness, or that Google is really good at this game, but I feel there is room for optimization.

I did not test Opus here, but it behaves more high level just as Gemini, except much more expensive than all.


Based on the above analysis, I think something needs to change in the MAX mode to bring the costs at similar levels, because right now the value is inverted.