I’m working on a professional CAD-related project and I’ve been using composer-2-fast recently. While the speed is impressive, the token consumption has become absolutely insane.
Look at my recent usage log:
03:02 PM: 6,365,000 tokens (One single request!)
02:54 PM: 4,380,000 tokens
09:57 AM: 4,766,000 tokens
In just a few hours, I’ve burned through tens of millions of tokens. It seems like the RAG/Index strategy is pulling in way too much context (possibly my entire CAD IR and geometry processing libraries) even for small logical changes.
My Questions:
Is there a way to limit the “Context Window” specifically for Composer?
Why does it re-scan and re-upload millions of tokens for consecutive edits?
I’m considering moving to OpenCode just to gain manual control over the context. Does anyone have tips on how to “tame” the indexing beast in Cursor?
Currently, the UX is great, but the cost/quota efficiency is becoming a deal-breaker for large-scale engineering projects.
That 6,3M tokens per request number in the usage dashboard isn’t what actually fit into the context window in a single model call. Composer 2 has a 200k token window, and each individual server-side LLM call stays within that limit. The dashboard sums input + output + cache_read + cache_write across all turns in one agent conversation. On a long agent run with dozens of tool calls, the numbers add up turn over turn, because each later turn re-sends the growing history, and most of that ends up coming from cache.
You can’t directly change the Composer context window limit. It’s fixed at 200k. But you can control what gets included in it.
Re-uploading millions of tokens between turns is basically cache reads. Technically they’re re-sent, but in practice they’re pulled from the provider’s prompt cache, so the price is much lower than full input.
Indexing and context timing for a large CAD project:
Use .cursorignore for heavy vendor libs, generated files, geometry assets, binaries. Syntax is like .gitignore. Docs: Ignore files | Cursor Docs
Start a new chat for each unrelated task. Long history means more cache reads turn over turn.
Use targeted @file or @folder mentions instead of letting the agent roam the tree. codebase_search pulls relevant chunks, but on huge projects it can expand more than you want.
For routine edits, try composer-2 without -fast. It’s cheaper, and the fast variant costs a lot more because it’s optimized for speed.
Use Plan mode before running. First scope what to touch, then execute.
If you open that specific 6,3M request in the dashboard, you should see a breakdown by token type (input vs cache_read vs output). That’ll show how much of it was billed at the cheaper rate.