hi @tejasPhaveri, thanks for the report. Cursor shows token usage exactly as returned by the AI provider for each request.
You can open Dashboard → Usage to inspect that specific request in detail, including whether caching was applied. The Billing & Invoices page only shows an aggregated summary.
In general, caching can reduce input token costs by around 90%, but it only helps when there are follow‑up requests that can reuse the same context. From the two highlighted lines in your screenshot, it looks like the -max request was able to cache most of its input tokens, while the single non‑-max call could not benefit from caching because there were no subsequent related requests. The first request in a context is always billed at the full input token price, and in this case it also appears to have used a large context window.
Could you share the detailed entries for those codex requests from the Usage log so we can double‑check what happened?
Thank you yes that was helpful. The one non-max request was indeed 100% input tokens as there was no follow up. Also it had all 232k tokens as full input context. Therefore the amount charged should be correct.
Do you recall if you had attached some files/logs/mcps/… as the total tokens are a lot.
It is not advisable to change AI models within an existing chat because on any model change we need to send the full thread to AI provider for tokenization and processing. in such an case there would not be a cache yet available and therefore full context would be processed at input cost.
gotcha, makes sense, curious tho, how does the subagents feature in the nightly build work? i see it goes off and gets context for the main chat, does that also have this input token cost thing?