How do cache reads and writes works?

leanderriefel · August 17, 2025, 11:08am

Can someone explain to me how cache reads and writes work?

How do I have more than 4 million tokens read from cache without having written anything to it?
Are these tokens cached by cursor locally, or in the cloud, or by the LLM provider?

condor · August 17, 2025, 11:11am

hi @leanderriefel

Cache reads and writes are handled automatically by AI provider to reduce your consumption cost by 70-90% depending on provider.

When you submit a request in Chat, the content is stored in cache for follow up requests and tool calls. Each follow up request and a tool call submits the history of that chat as input for AI processing, however with caching the provider knows that all previous part of the current chat was cached and does not need to tokenize it again.

You can see more details on token usage and how to reduce token consumption here:

Topic		Replies	Views
Someone please explain - Why are cache read and write chargeable? Discussions	5	362	March 5, 2026
Different token types Discussions	7	605	December 19, 2025
How is the Claude-4-sonnet consumption record calculated Help	6	715	July 9, 2025
How to reduce cache reads Help	7	1307	December 10, 2025
Understanding LLM Token Usage Guides	0	15193	July 20, 2025

How do cache reads and writes works?

Related topics