Cursor Usage Rates Jumping + resource_exhausted errors

I’ve been using cursor with ultra for a while and it’s been really good for me, as I’ve enjoyed how low the usage is (especially compared to claude). However recently, I’ve been using millions of tokens per prompt and in one day somehow managed to use near $200 of usage when on most other days im around $10-30. Just now, I’ve been getting an issue where 1 subagent keeps failing with resource_exhausted.

It probably has something to do with the fact that I’ve been using huge prompts, and I’ve been trying to cut down on that to help with the context bloat. However cursor still releases subagents with full essays for a prompt.

Was wondering what other people use to mitigate context bloat for cursor agent reads causing huge token consumption, and if maybe there’s another issue I’m not realizing thats going undetected for me right now. Went from only having used 30%ish of Auto after weeks of intensive work to near 70% after a week where I’ve been on vacation 90% of the time anyways.

Thanks!

I just turned off Index Repositories for Instant Grep beta as well, but not sure how to check if this was part of the issue yet

Hi @Zachicom Thanks for the post! What would help us investigate this specifically is if you could provide a request ID for one of the problematic prompts after temporarily disabling privacy mode. Here’s how you can do that:

To disable Privacy Mode:

  1. Open Cursor Settings with Cmd+Shift+J on macOS or Ctrl+Shift+J on Windows/Linux.
  2. Go to General.
  3. Turn Privacy Mode off / switch to Share Data.

To get the Request ID:

  1. Open the relevant conversation in the Chat sidebar.
  2. Click the ... menu.
  3. Select Copy Request ID.

With this we can give you a much better answer

Request ID for the chat where the subagent that keeps running into resource_exhausted error: 747fd83e-e2d7-4b18-bb76-0a816758f485

Likely has something to do with the huge prompts, however this is also the first time running into the issue and I used to personally use large detailed prompts & larger subagent scopes without running any usage issues hence why I’m curious about it right now

was wondering if anyone had any updates so far since I’ve ran through over 5% of my ultra usage already and its just barely past noon, I haven’t been doing any crazy changes either & have switched to much shorter prompts & context windows + regularly creating new chats

Hi @Zachicom I took a look at this and here’s what I found…
the resource_exhausted your subagent is hitting is not a usage or rate limit. Under the hood it’s a transient connection interruption on the streamed response between Cursor and the model provider that, unfortunately, surfaces under a confusing “resource_exhausted” label. It is not counted as usage, and it is not your account hitting a cap.

Why does it show up now? These interruptions are most likely on long-running, high-context turns — and large Opus “Extra High” subagents with extended thinking produce exactly that: long, large streamed responses, which have more opportunity to be interrupted mid-stream. Your prompt habits aren’t “wrong”; the bigger/longer subagent runs are simply more exposed to it. Re-running an interrupted turn generally continues fine, and these runs retry automatically.

Do you have a VPN enabled? Could be good to toggle that on /off. Also, please go to Cursor Settings → Network → Diagnostics and run the Diagnostics test we have there and make sure that everything checks out.

On the usage increase: That is a separate item. That reflects genuinely heavier use of the most powerful model tier (large contexts across many turns). Caching on your account is actually working well, so the rise is volume, not waste. Lmk if you have any other requests that you think are problematic that you’d like us to take a look at.