Claude Token Consumption spiked - Why?

Pascal.Engelmann · May 27, 2026, 6:00am

I am basically just starting off with Cursor and try out some new features.

I sent a couple of instructions using the Claude Opus 4.7 High Model. I essentially wanted to see what a high-end model can do.

The first runs were okay, with a few million tokens consumed but the last two runs were then excessive consuming almost 130m and 90m tokens. The requests were just minor changes (like add a hover effect, introduce a new color scheme, etc.). It was not drafting anything new; just adjusting and removing elements.

Did anyone else observe such behavior and how can I better control the token consumption?

Tom_Coustols · May 27, 2026, 6:14am

hey @Pascal.Engelmann, this is due to how models in general deals with context! each time you sent a message in a same thread, all the previous context is being resent to the model to provide accurate answers, what happened here was that all your previous chat was being resent multiple time, each time adding bits to the full amount.. And Cloud Agent being particulary strong, they each carried lots of context to the next message, usually, Cloud Agent are best used to tackle 1 off problem, or at least back and forth not needing new/different context, if that makes sense.. Well, other people are free to iterate on my message.. Hope this cleared out things for you!

Don’t hesitate if you have more questions!

Tom_Coustols · May 27, 2026, 6:17am

Some great ressources about it :

deanrie · May 29, 2026, 6:07am

Hey, Tom explained it correctly. Token usage grows because every agent step re-sends the entire accumulated context back to the model. In your sessions you had 188 and 119 tool-call rounds, so the context kept getting bigger each time. cache_read tokens are cheaper than normal tokens, but they still count toward usage. Most of your 90 to 130M tokens are actually cache_read.

Also, you’re using the most expensive setup: Claude Opus 4.7 Thinking High with the MAX flag. Thinking mode adds reasoning tokens, MAX expands the context window, and Opus is already much more expensive than Sonnet.

A few practical ways to save:

For small edits like hover tweaks, colors, or minor refactors, Composer 2.5 or Sonnet is enough. Opus High is overkill there.
Cloud Agent works best as one task per session. Long back-and-forth in the same thread makes each next step cost grow fast.
If the task drags on, start a new chat with fresh context instead of continuing to pile onto the existing one.

Topic		Replies	Views
Massive amounts of tokens in spending with "claude-fable-5-thinking-high" Bug Reports early-access , anthropic	7	296	June 12, 2026
GPT-5.5-medium burns through premium tokens extremely fast Discussions context , openai	5	295	June 28, 2026
Abnormally high token use Help auto-mode	2	119	April 23, 2026
Is composer-2-fast eating tokens for breakfast? 6.3M tokens in ONE request! Help indexing , context , large-codebases , composer	1	164	May 14, 2026
High consumption Help	2	183	April 23, 2026

Claude Token Consumption spiked - Why?

Related topics