Hey all!
This is effectively just going to rephrase @Andres_Cardona’s answer, but I hope it’s helpful. It might sound like it’s talking to beginners, but I want to make sure this is approachable for anybody reading!
When you hover over Tokens on your usage page, the number you see is the aggregate across every LLM call that contributed to that request, not a single call.
A single message in Cursor can (and typically does) trigger multiple LLM calls under the hood. The agent may read files, invoke tools, apply edits, or reason through a plan, and each of those steps constitutes a separate call. All of them are rolled up into a single row on the dashboard, and aggregation stops only when the request is complete (when you can type a new message).
To illustrate: suppose your first message sends 20k tokens of context, and overall it requires 10 LLM requests to finish. You’d see 20k input tokens and roughly 180k cached tokens, because each subsequent request reuses the same prefix the provider already has cached. Those cached tokens also carry forward to the next message within the same conversation.
This is also why you might see a total token count that exceeds the model’s context window. It isn’t one enormous call, but the sum of all calls made during that turn.
If you’re curious about what’s consuming your tokens, you can ask the agent directly: What’s in your context window right now? Be exhaustive.
We’re always looking to make the context window more efficient!