Understanding LLM Token Usage

Here is a detailed explanation of LLM (AI) token usage in Cursor:

Tokens are used every time you interact with AI models in Cursor. For transparency, token counts are shown as processed and reported by the AI provider. You can check your usage in ‎Dashboard > Usage and, if you have Usage Based Pricing, also in ‎Dashboard > Billing. The dashboard is being improved to show all AI usage of a single request in one line, even when multiple tool calls are combined.

Tokens are divided into four groups:

  • Input tokens: The original user request, including attached rules, files, and MCPs,…
    Input tokens are further split into:

    • Input with cache write: Information that needs to be cached for efficient processing of future steps.
    • Input without cache write: Information used only for a single step and not cached.
  • Output tokens: The AI’s responses, such as code, chat replies, and tool call results.

  • Cache Write tokens: Messages and tool call results that are saved in the AI provider’s temporary cache to make future requests more efficient.

  • Cache Read tokens: Cached tokens (chat history and context) used in later steps to generate new AI output. These tokens are cheaper, usually costing about 10-25% of input tokens.

Each token group is counted and priced according to the provider API cost * 1.2 , and the total token count is displayed.

How a request flows:

  1. Your initial request, along with context, is sent as input tokens.
  2. AI processes the input and generates output (output tokens).
  3. Tool calls may be triggered, such as reading files or searching code. These calls use the current context and add more tokens to context.
  4. Each tool call’s response is cached, increasing the cache size.
  5. If more steps are needed, the process repeats: new input is sent, more output is generated, and additional tool calls may occur.
  6. Follow-up requests in a chat work the same way. Each step adds to the cache, and cache read tokens accumulate as the context grows.

Example:

  • A request starts with 5,000 input tokens, which are processed and cached.
  • A tool call uses the cached context and adds another 5,000 tokens, which are also cached.
  • After two steps, the cache has 10,000 tokens. When these are used in the next API call to AI provider, they are counted as cache read tokens at a reduced cost (10–25% of input token price, depending on the provider).

If you see high token counts:

  • Input tokens: Too much context (rules, files, etc.) is attached. Reduce by removing unnecessary parts.

  • Output tokens: The AI’s response is large. Limit output if possible.

  • Cache Write tokens: A lot of context is being processed and cached. Streamline your instructions and attachments.

  • Cache Read tokens: These increase with long or complex chats, as context grows with each step. This is normal for chats with multiple tool calls & follow up requests but can be reduced by starting new chats for new tasks.

Tips to reduce token usage:

  • Attach only what’s necessary; modern AI can fetch relevant code efficiently.
  • Only enable MCP tools you need.
  • Keep rules short and focused.
  • Make tasks specific and focused.
  • Start a new chat for a new task.

Model selection:

  • Use simpler models (Auto) for basic tasks like log checks or code analysis.
  • Use stronger models (e.g. Sonnet) only when needed for complex coding.
  • “Thinking” models use much more tokens than standard ones.
  • Use strongest models (e.g. Opus) when strong models are not sufficient.

Note: Switching AI providers or models in the middle of a chat can increase token usage, since the new provider doesn’t have the previous chat steps in cache and must process them as new input. For best results, switch models or providers in a new chat.

17 Likes
Why is a simple edit eating 100,000+ tokens? Let’s talk about this
Monthly token limit in the $20 Pro Plan for Cursor usage
How To Optimize Your Usage: The Best AI Models to Use, version 2
How to disable Cache Write and Cache Read?
Seems like the usage quota only applies per model – using up Sonnet doesn't let you switch to Gemini!
You've hit your free requests limit
How to reduce cache reads
This is getting out of hand
'Included pricing' curiosity
Inquiry Regarding Usage Limits on the New Cursor Pro Plan
How many prompts do we get for Pro+ Plan?
Sonnet finishing up Pro limits faster than before
Cursor is expensive
Use Cursor with OpeHow mans do we get for Pro+ Plan?
Where can I find usage limits?
Coming Back to Cursor After Trying Alternatives – Is Ultra + Full Setup Worth It?
Ran out sonnet 4.0 in 35 requests - any ideas on how to to get back old pricing option?
Frustrated with Cursor’s Token Hungriness
Why is a simple edit eating 100,000+ tokens? Let’s talk about this
How to disable Cache Write and Cache Read?
Pricing Megathread and Q&A
Pro Plan fewer access to premium models
Exceeded subscription but still active
Cursor Auto Mode: Excessive Token Usage
Student plan vs pro plan
Question about Sonnet 4 costs and the Ultra plan
Cursor-Agent CLI Limit Hit
How do cache reads and writes works?
Would be cool if I could optionally put myself in slow mode for claude
Sonnet 4 / GPT-5 usage period is too short, Auto model is highly problematic
How to know when you will spend money or not. how to avoid paying too much for not important chat
Aborted / Errored Requests are being charged
I am with PRO PLAN and Agent has stopped working
Sonnet 4.5 - New model is available in Cursor
Someone please explain - Why are cache read and write chargeable?
Is Cursor Losing the Plot? Concerning Patterns Regarding Reluctance to Implement New Frontier Models Proactively
AI assistant not following instructions
You are projected to reach your usage limits by
How much can I use Claude Opus in max mode if I buy the $200 plan?
How many prompts are available for Pro Trail?
How to disable Cache Write and Cache Read?
For users spending $1,000-2,000/month on usage, is Ultra worth it over Pro + usage-based pricing?
Opus 4.1 is very expensive!
Cursor is expensive