I only @mentioned two files, but Agent Mode did a lot of things (many of them unnecessary ).
I have no idea how MAX Mode is currently calculating usage. If it keeps calculating like this, MAX Mode will basically be unusable within Agent.
I only @mentioned two files, but Agent Mode did a lot of things (many of them unnecessary ).
I have no idea how MAX Mode is currently calculating usage. If it keeps calculating like this, MAX Mode will basically be unusable within Agent.
Most likely the prompt was incorrect or too wide and the rules aren’t rigid/clear enough.
it’s expensive but it also works great so please do not ■■■■ it up if you guys “fix it”
MAX mode is using cached tokens. If the cache is empty, it will send lots of tokens, the rules, included files, system prompts and whatever else it thinks it should read, this is easily 60k-120k in tokens just on warmup.
If you keep the discussion concise, it will consume less tokens because it has all the data already. So first time it will burn a few dolars, then each tool use or update will be a few cents.
Warning: IF YOU DO NOT use the session, the CACHE WILL EXPIRE. This is somewhere in the range of tens of minutes or an hour. If the cache is emptied, refilling it is going to cost again.
Normal mode queries will use half the cost of a MAX tool call, for the whole reply including all tool calls in normal mode.
I decided to try o3 for a problem I’ve been trying to solve for two hours with other models. I burned 40 requests, 0.89$, and the problem still was not solved. But I learned pain T_T
After trying to solve the same problem with Gemini MAX (June) and Claude MAX (Sonnet 4), not only is Claude using more queries, but:
I understand that this behavior could be caused by my rules and existing code structure (Claude did 99% of it anyway), and there is a pressure point where the model is caught between highly tethered behaviors and is neurotic in satisfying all criteria, but I still think the team has not optimized this thoroughly. It may be that a month is enough to expand an LLM such that it eats all the context and speaks once in completion and correctness, or that Google is really good at this game, but I feel there is room for optimization.
I did not test Opus here, but it behaves more high level just as Gemini, except much more expensive than all.