I am basically just starting off with Cursor and try out some new features.
I sent a couple of instructions using the Claude Opus 4.7 High Model. I essentially wanted to see what a high-end model can do.
The first runs were okay, with a few million tokens consumed but the last two runs were then excessive consuming almost 130m and 90m tokens. The requests were just minor changes (like add a hover effect, introduce a new color scheme, etc.). It was not drafting anything new; just adjusting and removing elements.
Did anyone else observe such behavior and how can I better control the token consumption?
hey @Pascal.Engelmann, this is due to how models in general deals with context! each time you sent a message in a same thread, all the previous context is being resent to the model to provide accurate answers, what happened here was that all your previous chat was being resent multiple time, each time adding bits to the full amount.. And Cloud Agent being particulary strong, they each carried lots of context to the next message, usually, Cloud Agent are best used to tackle 1 off problem, or at least back and forth not needing new/different context, if that makes sense.. Well, other people are free to iterate on my message.. Hope this cleared out things for you!
Hey, Tom explained it correctly. Token usage grows because every agent step re-sends the entire accumulated context back to the model. In your sessions you had 188 and 119 tool-call rounds, so the context kept getting bigger each time. cache_read tokens are cheaper than normal tokens, but they still count toward usage. Most of your 90 to 130M tokens are actually cache_read.
Also, you’re using the most expensive setup: Claude Opus 4.7 Thinking High with the MAX flag. Thinking mode adds reasoning tokens, MAX expands the context window, and Opus is already much more expensive than Sonnet.
A few practical ways to save:
For small edits like hover tweaks, colors, or minor refactors, Composer 2.5 or Sonnet is enough. Opus High is overkill there.
Cloud Agent works best as one task per session. Long back-and-forth in the same thread makes each next step cost grow fast.
If the task drags on, start a new chat with fresh context instead of continuing to pile onto the existing one.