Can someone explain to me how cache reads and writes work?
How do I have more than 4 million tokens read from cache without having written anything to it?
Are these tokens cached by cursor locally, or in the cloud, or by the LLM provider?
Can someone explain to me how cache reads and writes work?
How do I have more than 4 million tokens read from cache without having written anything to it?
Are these tokens cached by cursor locally, or in the cloud, or by the LLM provider?
Cache reads and writes are handled automatically by AI provider to reduce your consumption cost by 70-90% depending on provider.
When you submit a request in Chat, the content is stored in cache for follow up requests and tool calls. Each follow up request and a tool call submits the history of that chat as input for AI processing, however with caching the provider knows that all previous part of the current chat was cached and does not need to tokenize it again.
You can see more details on token usage and how to reduce token consumption here: