Does anyone experience issues with unpredictable token usage?
On a fairly trivially request to change a few function signatures, GPT-5.4-high thought for a cumulative total of 76 seconds, and modified 152 lines of code, over the course of maybe 12 or 15 minutes. A long time, but certainly within the bounds of reason. But somehow this used 40.2 million tokens??
I don’t know how this is possible. Mis-counting? A bug? Inefficient tool use? The issues is that I don’t even know where the tokens were used. Over the previous 6 hours of work, I think I consumed around 42 million tokens using a combination of GPT-5.4-high and 5.4-xhigh.
How do people manage to use Cursor reliably if usage can vary by a factor of 30 between requests of similar complexity and duration, and it’s impossible to understand and control token usage as a result of being given no visibility?
This is likely due to a large number of cache tokens being used. You can verify this by hovering over the token count in your usage dashboard. Cache reads are cheaper than regular tokens (see model pricing).
I’ve written a more detailed explanation of this behavior and why users sometimes see such high token counts. It might be interesting for you!