Cursor Auto Mode: Excessive Token Usage

Describe the Bug

Dear Cursor Devs,
dear @condor,

I switched to using cursor pro so I can further manage basic tasks in my codebase using Auto mode.

I just told my ai to git status, add ., commit with message and push which it perfectly did.
But looking at the tokens usage, I really wonder what is going on (see screenshot).
Nearly 300k tokens for this seems massive (easily over 1k DIN A4 Pages of text, if I am not mistaken).

Steps to Reproduce

ask it to do a simple task in Auto mode.

Screenshots / Screen Recordings

Operating System

Linux

Current Cursor Version (Menu → About Cursor → Copy)

Version: 1.2.4
VSCode Version: 1.99.3
Commit: a8e95743c5268be73767c46944a71f4465d05c90
Date: 2025-07-10T16:59:43.242Z
Electron: 34.5.1
Chromium: 132.0.6834.210
Node.js: 20.19.0
V8: 13.2.152.41-electron.0
OS: Linux x64 6.14.0-23-generic

Does this stop you from using Cursor

No - Cursor works, but with this issue

Hi @proteus-dev and thank you for the bug report.

Please add also the Raw event Token details for the event in question.

Could you post a Request ID with privacy disabled so we can look into the details?

about the request ID: I would love to provide more details, but last time I read about the privacy mode, I was left a bit puzzled about what details will be shared with which provider. Is this more transparent now? If so, where can I read about it to make an informed decision?

Yes, understandable. Privacy disabled means the request details would be viewable for Cursor team, that would include also code sent to AI.

Overview on Privacy modes

Full privacy policy

thanks, I also went on searching and just read it.
If I now disable privacy mode and provide the request ID and enable it again… what will be roughly shared for what purpose

  1. after switching it to off and
  2. after switching it on again?

background:
I am from a country with a sometimes weird obsession about data privacy.

After switching it off, requests sent will be stored and are usable by Cursor for improvements.

After switching it on again, new requests won’t be stored, though the shared one still is.

Overall even for stored requests, with account deletion those would be removed.

one moment…
I will provide a request Id following this approach in a sec.

ok I learned a few things:

  1. the documentation of used tokens in the cursor dashboard is rather fluid and may change after executing another (unrelated?) request. For an existing Git request that was already documented, the number of documented tokens used shot up to ~360k tokens.

  2. I removed my rules, which shaved ~100k

  3. I removed the context completly and just provided a path which reduced everything down to 33k

Request Id with the 33k example:
2c9a9498-a0dd-4373-b7b3-112cfe7a0c10

Why I initially saw no problem with my rules or context:
A few weeks ago, I had many requests with the same rules and far more context directly added to the context, doing heavy implementation workloads using fewer tokens.

all these numbers above and what removing my rules and some context shaved off still seem massive to me, including the completly naked 33k request.

ironically, the documented tokens for the 33k example shot up to 45k during writing this post:

From the temporary shared screenshot it looked like the ~80k context was used in ~4-5 tool calls, which comes close to the cache read usage.

Yes the events update for Usage report trails a bit as it’s a summary of a request.

For the request with 3.5k input, were there ~10 tool calls?

why are tool calls so expensive?
After all, these are still roughly 75 DIN A4 pages of text.

the 3.5k request had

  1. cd + git add .
  2. git commit
  3. git config user.name (something went wrong here)
  4. git commit
  5. git push

→ 5 calls

Each tool call is another API request and it uses the provided context (already tokenized, from cache) to provide a response.

So this is expected?
How much do cache reads affect my limit? (where ever the limit might be…)

I will ask the engineers to have a detailed look to make sure nothing is going wrong.

From the screenshots and my experience with AI APIs this is how it works.

  • The first request sends the ‘task’, (incl. system prompt), AI response produces what needs to be done next > Tool call for terminal
  • Tool call happens and response is sent with full preceding chat (cached, so not processed again) to make a response, e.g. next step
  • this repeats until the ‘task’ is complete.

Cache reads are cheapest tokens.

  • Anthropic charges 10% of input token price for cache reads.
  • Gemini charges 25% of input token price for cache reads.

So if we would turn off the cache the input token price would multiply by 10, assuming the model is from Anthropic.

On the new plans the request total cost matters. This is combined by AI API price per token type depending on tokens consumed, model and provider.

Though as auto is not limited it does not contribute to your monthly plan usage (on new plans).

Cache reads may add up on long chat threads and with follow up requests. Its actually a good point to test token usage and understanding of context with Auto.

Thanks!
I already got what I needed but if there will be additional information from the engineers I sure will look at it.

Just curious, auto mode is free, why do you care?

Auto mode is extremely poor quality.

to better understand token usage in general, for non-free models for instance.

for ask to?

Yes you can use Auto mode for asking coding questions but also to see how your requests use tokens.

Awesome, thank you so much!

‫בתאריך יום ד׳, 30 ביולי 2025 ב-11:26 מאת ‪Condor via Cursor - Community Forum‬‏ <‪[email protected]‬‏>:‬