There is a major discrepancy in prompt caching behavior between “Auto” mode and manual model selection. When using Auto mode, the system consistently fails to use prompt caching (0 Cache Read / 0 Cache Write), forcing the entire context to be resent and billed as fresh Input tokens. When manually selecting a model (like Claude 4.5 Haiku), caching works perfectly.
Steps to Reproduce
Use the “Auto” model selector in a project with a large context.
Ask a question and check the dashboard: Cache Read is 0 and costs are high.
Switch to a manual model (e.g., Claude 4.5 Haiku).
Ask a question: Cache Read/Write is active and costs are normal.
Expected Behavior
Auto mode should utilize prompt caching to avoid redundant token billing, especially on large requests.
Auto mode (problematic) vs. Claude 4.5 Haiku (working)
Additional Information
My company enforces Privacy Mode. The Auto mode seems to bypass the caching mechanism entirely, which is a critical financial issue when working on large repositories.
On your question: Auto mode routes requests to different models, not just Anthropic (Claude), but also GPT, Gemini, and others. The Cache Read / Cache Write columns in the dashboard only show Anthropic-style prompt caching. When Auto mode sends a request to, say, GPT-5.4, those models use different caching mechanisms (or none at all), and that doesn’t show up in the same columns. That’s why you’re seeing zeros.
When you manually pick Claude 4.5 Haiku, the request is guaranteed to go to Anthropic, where Anthropic-style caching works and shows up in the dashboard.
Current workaround: manually select a specific Claude model if caching and token savings are important for you.
Thanks for the clarification, but this explanation doesn’t quite address the core of the problem for two reasons:
Cost Efficiency: I prefer using Auto mode because it is designed to be more cost-effective by routing requests to the most appropriate model. Forcing a manual selection of a “larger” model just to get caching visibility seems to defeat the purpose of an optimized Auto mode.
Inconsistency between users: This is the most confusing part. Several of my colleagues are also using Auto mode on the exact same projects, yet their dashboards clearly show active Cache Read / Cache Write and, consequently, much lower costs than mine.
If Auto mode was simply routing to models without Anthropic-style caching, my colleagues should also be seeing zeros. The fact that they get caching benefits in Auto mode while I don’t (on the same codebase) suggests there is a routing or configuration issue specific to my account or version, rather than just a dashboard display limitation.
Could there be another reason why Auto mode behaves differently for me compared to my peers?
Good point, that’s a different question then. If your colleagues on the same projects are seeing active Cache Read/Write in Auto mode and you’re not, that does suggest a routing difference rather than just a dashboard display issue.
A couple things would help me narrow this down:
Could you share a screenshot of one of your colleague’s dashboard rows where caching is active in Auto mode? That way I can compare the model routing on their side vs yours.
Are your colleagues on the same Cursor version (2.6.12)?
In the meantime, I’ve flagged this with the team. The inconsistency between users on the same team is worth investigating. I’ll update here if there’s any news.
As you can see, my colleague is using Auto mode and consistently getting Cache Read/Write benefits, which confirms that Auto mode is capable of routing to models that support caching for our project.
2. Colleague’s Version Information My colleague is running a slightly different version than the one mentioned earlier. Here is their full “About Cursor” info:
It seems they are on version 2.6.18, while I was on a different build. Could this specific version (or a deployment flag associated with it) be the reason why the routing logic handles caching differently for us?
I see your colleague’s screenshot. It confirms that Auto mode can use caching in general. Nice catch.
I noticed you’re on 2.6.12, while your colleague is on 2.6.18. The request routing logic may have changed between those versions, and that could explain the difference in caching behavior.
First step: update to the latest version, at least 2.6.18, and check if anything changes. To update: Help > Check for Updates.
If caching still doesn’t work in Auto mode after the update, please share a Request ID from one of the affected requests (top-right of the chat > Copy Request ID) so I can see which model your request is being routed to.
Before updating, I manually added Claude 4.6 Sonnet to my model list. Interestingly, right after doing this, the Auto mode started utilizing prompt caching again. It seems that having a caching-compatible model explicitly enabled in the settings might have influenced the Auto mode’s routing logic.
I have since updated Cursor to version 2.6.19 to ensure everything is up to date.
The main issue is resolved—I am finally seeing “Cache Read” and “Cache Write” data in Auto mode. However, I’ve noticed that the amount of cached tokens is still significantly lower than what my colleagues are getting on similar tasks.
While the cost has dropped, the efficiency doesn’t seem fully optimized yet. I will monitor this over the next few days and provide a Request ID if the cache hit rate remains abnormally low.
Thanks for your help in pinpointing the version and routing discrepancy!
Great, glad the main issue is fixed. Nice catch about adding Claude 4.6 Sonnet. That really could affect which models Auto mode picks for routing.
About the lower cache hit rate compared to your coworkers, that can happen if Auto mode is routing you and them to different models. For example, some of your requests might still be going to models that do not use Anthropic style caching. That is normal behavior for Auto mode.
Keep an eye on it for a couple of days like you planned. If the cache rate stays noticeably lower, send me a Request ID from one of the low caching requests and I can check which model it routed to. You can copy it from the chat menu in the top right, then Copy Request ID.
The team knows caching in Auto mode can be improved. Your report helps with prioritization.