Understanding Write Cache

Hey @Brandyn_Brosemer!

When you send a request, we build the prompt in a cache-friendly way and hand it to the underlying model provider (Anthropic, OpenAI, etc.), and the provider’s own cache is what decides whether you get a cache hit or a full re-seed. So the cache duration and refresh behavior are ultimately defined by the model provider, not by Cursor, and they don’t change based on whether you’re running locally, in a Cloud Agent, or self-hosted.

Anthropic is the only provider that has an explicit “cache write” — the first request that establishes a cached prefix costs more than a normal input token, and subsequent requests that hit that prefix are billed at the much cheaper “cache read” rate.

Cursor uses Anthropic’s default cache window of ~5 minutes, which is a sliding window: every cache hit extends it, so an actively used conversation stays warm indefinitely. If a conversation with a Claude model sits idle for more than ~5 minutes, the next turn will be a full re-seed (and a new cache write)!

One gotcha: provider caches require an exact token-prefix match, so things like switching models mid-thread, editing an earlier message, or toggling tools/rules will re-seed the cache even well inside the 5-minute window. The biggest wins for cache hits usually come from keeping the early part of the conversation stable, not from watching the clock.