auto isn’t a separate model. it’s a routing mode that picks an underlying model per request, and it inherits that model’s prompt caching behavior when the request is cache-eligible. So caching is happening; it just follows whichever model it got routed to.
The catch is hit rates: when the route shifts, it can’t always reuse cache entries created for a different underlying model/provider, so hit rates can be lower than when you pin one model.
Also, high token counts don’t necessarily mean caching isn’t happening — they often reflect the total context considered, including files, prior turns, rules, and other injected context.
If you were running complex long-running agent tasks with Auto, then it’s totally possible that those could have run up more tokens than the tasks you performed with Claude Opus.
Thanks for flagging this. I took a look at the usage rows from Apr 22, including the Sonnet request and the git-related Composer request.
The large token totals shown in the dashboard are real, but they are mostly cache reads rather than fresh input tokens. For example, the Sonnet request showing about 1.4M total tokens had almost no fresh input in the final model call. The total was mainly cached context being reused across multiple internal agent steps, plus a cache write and output tokens.
The git-related request follows the same pattern: the dashboard total was driven mostly by cached context reads across several Composer agent steps. It does not look like your 3k-token plan was sent as 1.4M fresh input.
To be clear, we do not see evidence of a bug or incorrect token accounting for these requests. The high totals are expected behavior when Composer is working with a large active context and reusing that context across multiple internal steps.
same here. past moth cycle for me 18th march to 18th april, i did not even touched the full limits of the auto and composer, and now this month only 12 days gone by, and already used 100% of auto+composer plus all my api credits, without really performing a basically three times my usual consuption in a week. Btw i have the pro+ plan
Normally I use Composer 2 and never hit my limit during the month, but I’ll usually be getting pretty close by the end of the month.
This month I hit the limit within the first week, despite doing work as normal. I don’t have any specific request examples, but have been wondering what was different.
Hi Juan, I took a look at your usage, and I see that there may have been a few instances of limited cache usage. However, I’ve also sent some additional information to your DMs.
I’ve been using Composer 2 and the token consumption is alarmingly high — a single prompt burned through 89.6M tokens. It’s making me hesitant to continue using it.
It seems like the issue isn’t just increased token consumption, but also a higher price per token. I’ve gathered some stats from the /dashboard/usage page regarding the Composer model:
On April 20, I used 8.617M tokens and paid $3.42 (~$0.396 per 1M).
On May 8, I used 4.8452M tokens and paid $8.17 (~$1.686 per 1M).
And interestingly, it felt like I actually wrote significantly more code on April 20.
All in all, the difference is striking. I’m working on the exact same project in the same workflow—back in March/April, my Composer limits easily lasted a month (I didn’t even hit 50%). Now, after just a couple of days of moderate work, I’ve already burned through 10%.