Auto mode: Prompt caching not working

Where does the bug appear (feature/product)?

Cursor IDE

Describe the Bug

There is a major discrepancy in prompt caching behavior between “Auto” mode and manual model selection. When using Auto mode, the system consistently fails to use prompt caching (0 Cache Read / 0 Cache Write), forcing the entire context to be resent and billed as fresh Input tokens. When manually selecting a model (like Claude 4.5 Haiku), caching works perfectly.

Steps to Reproduce

Use the “Auto” model selector in a project with a large context.
Ask a question and check the dashboard: Cache Read is 0 and costs are high.
Switch to a manual model (e.g., Claude 4.5 Haiku).
Ask a question: Cache Read/Write is active and costs are normal.

Expected Behavior

Auto mode should utilize prompt caching to avoid redundant token billing, especially on large requests.

Screenshots / Screen Recordings

Operating System

Linux

Version Information

Version: 2.6.12
VSCode Version: 1.105.1
Commit: 1917e900a0c4b0111dc7975777cfff60853059d0
Date: 2026-03-04T21:41:18.914Z
Build Type: Stable
Release Track: Default
Electron: 39.6.0
Chromium: 142.0.7444.265
Node.js: 22.22.0
V8: 14.2.231.22-electron.0
OS: Linux x64 6.17.0-1011-oem

For AI issues: which model did you use?

Auto mode (problematic) vs. Claude 4.5 Haiku (working)

Additional Information

My company enforces Privacy Mode. The Auto mode seems to bypass the caching mechanism entirely, which is a critical financial issue when working on large repositories.

Does this stop you from using Cursor

Sometimes - I can sometimes use Cursor

Hey, thanks for the report.

On your question: Auto mode routes requests to different models, not just Anthropic (Claude), but also GPT, Gemini, and others. The Cache Read / Cache Write columns in the dashboard only show Anthropic-style prompt caching. When Auto mode sends a request to, say, GPT-5.4, those models use different caching mechanisms (or none at all), and that doesn’t show up in the same columns. That’s why you’re seeing zeros.

When you manually pick Claude 4.5 Haiku, the request is guaranteed to go to Anthropic, where Anthropic-style caching works and shows up in the dashboard.

Current workaround: manually select a specific Claude model if caching and token savings are important for you.

Let me know if you still have any questions.

Hi Dean,

Thanks for the clarification, but this explanation doesn’t quite address the core of the problem for two reasons:

  1. Cost Efficiency: I prefer using Auto mode because it is designed to be more cost-effective by routing requests to the most appropriate model. Forcing a manual selection of a “larger” model just to get caching visibility seems to defeat the purpose of an optimized Auto mode.

  2. Inconsistency between users: This is the most confusing part. Several of my colleagues are also using Auto mode on the exact same projects, yet their dashboards clearly show active Cache Read / Cache Write and, consequently, much lower costs than mine.

If Auto mode was simply routing to models without Anthropic-style caching, my colleagues should also be seeing zeros. The fact that they get caching benefits in Auto mode while I don’t (on the same codebase) suggests there is a routing or configuration issue specific to my account or version, rather than just a dashboard display limitation.

Could there be another reason why Auto mode behaves differently for me compared to my peers?

Best regards,

Marc

Good point, that’s a different question then. If your colleagues on the same projects are seeing active Cache Read/Write in Auto mode and you’re not, that does suggest a routing difference rather than just a dashboard display issue.

A couple things would help me narrow this down:

  1. Could you share a screenshot of one of your colleague’s dashboard rows where caching is active in Auto mode? That way I can compare the model routing on their side vs yours.
  2. Are your colleagues on the same Cursor version (2.6.12)?

In the meantime, I’ve flagged this with the team. The inconsistency between users on the same team is worth investigating. I’ll update here if there’s any news.

Hi Dean,

Thanks for looking into this. Here are the details you requested to help narrow down the investigation:

1. Colleague’s Dashboard Screenshot

As you can see, my colleague is using Auto mode and consistently getting Cache Read/Write benefits, which confirms that Auto mode is capable of routing to models that support caching for our project.

2. Colleague’s Version Information My colleague is running a slightly different version than the one mentioned earlier. Here is their full “About Cursor” info:

Plaintext

Version: 2.6.18
VSCode Version: 1.105.1
Commit: 68fbec5aed9da587d1c6a64172792f505bafa250
Date: 2026-03-10T02:01:17.430Z
Build Type: Stable
Release Track: Default
Electron: 39.6.0
Chromium: 142.0.7444.265
Node.js: 22.22.0
V8: 14.2.231.22-electron.0
OS: Linux x64 6.17.0-1012-oem

It seems they are on version 2.6.18, while I was on a different build. Could this specific version (or a deployment flag associated with it) be the reason why the routing logic handles caching differently for us?

I look forward to hearing what the team finds.

Best regards,

Marc

I see your colleague’s screenshot. It confirms that Auto mode can use caching in general. Nice catch.

I noticed you’re on 2.6.12, while your colleague is on 2.6.18. The request routing logic may have changed between those versions, and that could explain the difference in caching behavior.

First step: update to the latest version, at least 2.6.18, and check if anything changes. To update: Help > Check for Updates.

If caching still doesn’t work in Auto mode after the update, please share a Request ID from one of the affected requests (top-right of the chat > Copy Request ID) so I can see which model your request is being routed to.

Let me know how it goes after the update.

I have some positive updates.

Before updating, I manually added Claude 4.6 Sonnet to my model list. Interestingly, right after doing this, the Auto mode started utilizing prompt caching again. It seems that having a caching-compatible model explicitly enabled in the settings might have influenced the Auto mode’s routing logic.

I have since updated Cursor to version 2.6.19 to ensure everything is up to date.

The main issue is resolved—I am finally seeing “Cache Read” and “Cache Write” data in Auto mode. However, I’ve noticed that the amount of cached tokens is still significantly lower than what my colleagues are getting on similar tasks.

While the cost has dropped, the efficiency doesn’t seem fully optimized yet. I will monitor this over the next few days and provide a Request ID if the cache hit rate remains abnormally low.

Thanks for your help in pinpointing the version and routing discrepancy!

Best regards,

Marc

Great, glad the main issue is fixed. Nice catch about adding Claude 4.6 Sonnet. That really could affect which models Auto mode picks for routing.

About the lower cache hit rate compared to your coworkers, that can happen if Auto mode is routing you and them to different models. For example, some of your requests might still be going to models that do not use Anthropic style caching. That is normal behavior for Auto mode.

Keep an eye on it for a couple of days like you planned. If the cache rate stays noticeably lower, send me a Request ID from one of the low caching requests and I can check which model it routed to. You can copy it from the chat menu in the top right, then Copy Request ID.

The team knows caching in Auto mode can be improved. Your report helps with prioritization.