Understanding Write Cache

Describe the Bug

We’re trying to better understand Cursor’s caching behavior for context window data that gets sent back to the server. Specifically, we’d like to know how the cache works, how long the cache remains active, and whether the behavior changes depending on the environment, such as local, cloud, or self-hosted deployments.

Our current understanding is that when previously cached context is resent, the cost is significantly lower than a full resend. What we’re unclear on is the length of that cache window and how best to work within it so we don’t leave a conversation open long enough to trigger a more expensive re-seeding of the cache.

Any detail you can share about cache duration, refresh behavior, and whether those timings vary by environment would be greatly appreciated.

Steps to Reproduce

Not a bug.

Expected Behavior

Not a bug.

Operating System

MacOS

Version Information

Not a cursor problem

Does this stop you from using Cursor

No - Cursor works, but with this issue

Hey @Brandyn_Brosemer!

When you send a request, we build the prompt in a cache-friendly way and hand it to the underlying model provider (Anthropic, OpenAI, etc.), and the provider’s own cache is what decides whether you get a cache hit or a full re-seed. So the cache duration and refresh behavior are ultimately defined by the model provider, not by Cursor, and they don’t change based on whether you’re running locally, in a Cloud Agent, or self-hosted.

Anthropic is the only provider that has an explicit “cache write” — the first request that establishes a cached prefix costs more than a normal input token, and subsequent requests that hit that prefix are billed at the much cheaper “cache read” rate.

Cursor uses Anthropic’s default cache window of ~5 minutes, which is a sliding window: every cache hit extends it, so an actively used conversation stays warm indefinitely. If a conversation with a Claude model sits idle for more than ~5 minutes, the next turn will be a full re-seed (and a new cache write)!

One gotcha: provider caches require an exact token-prefix match, so things like switching models mid-thread, editing an earlier message, or toggling tools/rules will re-seed the cache even well inside the 5-minute window. The biggest wins for cache hits usually come from keeping the early part of the conversation stable, not from watching the clock.