Understanding Write Cache

Colin · April 9, 2026, 1:21pm

When you send a request, we build the prompt in a cache-friendly way and hand it to the underlying model provider (Anthropic, OpenAI, etc.), and the provider’s own cache is what decides whether you get a cache hit or a full re-seed. So the cache duration and refresh behavior are ultimately defined by the model provider, not by Cursor, and they don’t change based on whether you’re running locally, in a Cloud Agent, or self-hosted.

Anthropic is the only provider that has an explicit “cache write” — the first request that establishes a cached prefix costs more than a normal input token, and subsequent requests that hit that prefix are billed at the much cheaper “cache read” rate.

Cursor uses Anthropic’s default cache window of ~5 minutes, which is a sliding window: every cache hit extends it, so an actively used conversation stays warm indefinitely. If a conversation with a Claude model sits idle for more than ~5 minutes, the next turn will be a full re-seed (and a new cache write)!

One gotcha: provider caches require an exact token-prefix match, so things like switching models mid-thread, editing an earlier message, or toggling tools/rules will re-seed the cache even well inside the 5-minute window. The biggest wins for cache hits usually come from keeping the early part of the conversation stable, not from watching the clock.

Topic		Replies	Views
Sonnet 4.5 - New model is available in Cursor Release Discussions	129	12143	December 18, 2025
How to disable Cache Write and Cache Read? Help	50	3439	July 29, 2025
Max Mode for Claude 3.7 - Out Now! Release Discussions	64	70658	April 24, 2025
Why does Cursor consume an absurd amount of cache read tokens? Discussions context	23	4211	June 27, 2026
Opus 4.7 - Out Now! Release Discussions anthropic	58	10740	May 26, 2026

Understanding Write Cache

Related topics