I swear cursor has NEVER taken this much tokens up per request. I’ve been using Cursor for a long time now, usually under wayyyyyy more comprehensive circumstances, and I have never gotten even close to using $5 dollars of usage in less than a day. I have noticed while tracking my usage that also sometimes, even when the model does 0 tool calling, it instead of blowing up with cached read tokens blows up with “Input” tokens. How does this make sense. I asked it how I’d package the extension im building for distribution, it outputted 900 tokens according to tokencounter and my input was 113 tokens, yet it claims I had 92k input tokens and 4k output tokens and "charged’ me 16 cents. This does not make any sense, as there were 0 tool calls. After doing some research, I can see many people are reaching their monthly limit extremely quick, so this is just very weird.
I just don’t understand how a request with 0 tool calls, a 113 token input and 900 token output shows up as 92k input tokens in usage with “16 cents” of usage.
GPT-5-High. What doesn’t make sense to me is that the only way I can think of the tokens SOMEHOW getting that high is if the model did a ton of thinking and that thinking gets factored into the input token count. Now the thing is, on all the requests i have that show 1-2.1 million cache read tokens, that it literally spent thinking for like 5 minutes before it gave an output, the “input tokens” are like 5k. So if the model can think in those responses for that long and only have 5k input tokens, how can a model that thinks for 1 minute, doesnt do any tool calling, and gets a 113 token input and outputs 900 tokens worth of text have an 92k input token count?