Is composer-2-fast eating tokens for breakfast? 6.3M tokens in ONE request!

Hi everyone,

I’m working on a professional CAD-related project and I’ve been using composer-2-fast recently. While the speed is impressive, the token consumption has become absolutely insane.

Look at my recent usage log:

  • 03:02 PM: 6,365,000 tokens (One single request!)

  • 02:54 PM: 4,380,000 tokens

  • 09:57 AM: 4,766,000 tokens

In just a few hours, I’ve burned through tens of millions of tokens. It seems like the RAG/Index strategy is pulling in way too much context (possibly my entire CAD IR and geometry processing libraries) even for small logical changes.

My Questions:

  1. Is there a way to limit the “Context Window” specifically for Composer?

  2. Why does it re-scan and re-upload millions of tokens for consecutive edits?

  3. I’m considering moving to OpenCode just to gain manual control over the context. Does anyone have tips on how to “tame” the indexing beast in Cursor?

Currently, the UX is great, but the cost/quota efficiency is becoming a deal-breaker for large-scale engineering projects.


Hey, let’s clear this up.

That 6,3M tokens per request number in the usage dashboard isn’t what actually fit into the context window in a single model call. Composer 2 has a 200k token window, and each individual server-side LLM call stays within that limit. The dashboard sums input + output + cache_read + cache_write across all turns in one agent conversation. On a long agent run with dozens of tool calls, the numbers add up turn over turn, because each later turn re-sends the growing history, and most of that ends up coming from cache.

Important: cache read tokens are billed, but at a much lower rate, around 10% of normal input. So 6M cache reads and 6M fresh input are very different in cost. Colin broke down the numbers here: Why does Cursor consume an absurd amount of cache read tokens? - #24 by Colin

On your questions:

  1. You can’t directly change the Composer context window limit. It’s fixed at 200k. But you can control what gets included in it.
  2. Re-uploading millions of tokens between turns is basically cache reads. Technically they’re re-sent, but in practice they’re pulled from the provider’s prompt cache, so the price is much lower than full input.
  3. Indexing and context timing for a large CAD project:
    • Use .cursorignore for heavy vendor libs, generated files, geometry assets, binaries. Syntax is like .gitignore. Docs: Ignore files | Cursor Docs
    • Start a new chat for each unrelated task. Long history means more cache reads turn over turn.
    • Use targeted @file or @folder mentions instead of letting the agent roam the tree. codebase_search pulls relevant chunks, but on huge projects it can expand more than you want.
    • For routine edits, try composer-2 without -fast. It’s cheaper, and the fast variant costs a lot more because it’s optimized for speed.
    • Use Plan mode before running. First scope what to touch, then execute.

If you open that specific 6,3M request in the dashboard, you should see a breakdown by token type (input vs cache_read vs output). That’ll show how much of it was billed at the cheaper rate.