Is composer-2-fast eating tokens for breakfast? 6.3M tokens in ONE request!

deanrie · May 14, 2026, 7:13pm

Hey, let’s clear this up.

That 6,3M tokens per request number in the usage dashboard isn’t what actually fit into the context window in a single model call. Composer 2 has a 200k token window, and each individual server-side LLM call stays within that limit. The dashboard sums input + output + cache_read + cache_write across all turns in one agent conversation. On a long agent run with dozens of tool calls, the numbers add up turn over turn, because each later turn re-sends the growing history, and most of that ends up coming from cache.

Important: cache read tokens are billed, but at a much lower rate, around 10% of normal input. So 6M cache reads and 6M fresh input are very different in cost. Colin broke down the numbers here: Why does Cursor consume an absurd amount of cache read tokens? - #24 by Colin

On your questions:

You can’t directly change the Composer context window limit. It’s fixed at 200k. But you can control what gets included in it.
Re-uploading millions of tokens between turns is basically cache reads. Technically they’re re-sent, but in practice they’re pulled from the provider’s prompt cache, so the price is much lower than full input.
Indexing and context timing for a large CAD project:
- Use .cursorignore for heavy vendor libs, generated files, geometry assets, binaries. Syntax is like .gitignore. Docs: Ignore files | Cursor Docs
- Start a new chat for each unrelated task. Long history means more cache reads turn over turn.
- Use targeted @file or @folder mentions instead of letting the agent roam the tree. codebase_search pulls relevant chunks, but on huge projects it can expand more than you want.
- For routine edits, try composer-2 without -fast. It’s cheaper, and the fast variant costs a lot more because it’s optimized for speed.
- Use Plan mode before running. First scope what to touch, then execute.

If you open that specific 6,3M request in the dashboard, you should see a breakdown by token type (input vs cache_read vs output). That’ll show how much of it was billed at the cheaper rate.

Topic		Replies	Views
Claude Token Consumption spiked - Why? Help anthropic	3	120	May 29, 2026
Abnormally high token use Help auto-mode	2	119	April 23, 2026
Cursor high token usage Help context , byok , large-codebases	12	1604	June 26, 2026
Massive amounts of tokens in spending with "claude-fable-5-thinking-high" Bug Reports early-access , anthropic	7	296	June 12, 2026
50 MILLION TOKENS per request? is this Normal? Help ask-mode , context , large-codebases , anthropic	2	201	April 29, 2026

Is composer-2-fast eating tokens for breakfast? 6.3M tokens in ONE request!

Related topics