How to reduce cache reads

condor · September 9, 2025, 11:46am

hi @tiantianaixuexi there are a few ways to reduce cache reads.

As @Ra.in says they are important to reduce your cost at no quality loss.

Why do cache reads happen?

When you submit a request for Agent it gets processed at AI providers into tokens and then you receive a response from AI. Those tokens are cached for next replies to reduce cost by 90%.
Each AI tool call and user request is an API request to AI provider. As now only the new part of the chat or AI tool call need to be processed you receive 90% token cost for cached tokens. AI still processes whole context to create a response.
Not using cache would mean that the same amount of tokens would cost full input price at 9x more than the cached token cost.

When is it an issue?

In case a chat thread gets too long it may have a large context used and therefore consume a lot of tokens accumulated.
Possible effect is also that AI gets confused by too much conflicting information in context when chat gets too long.

Solution:

Keep each chat focused on single task.
Use simpler models for simpler tasks.
Use large context like Sonnet 4 1M only if the regular Sonnet 4 model can not fit the required context in 200k tokens. Note that tokens over 200k cost 2x as much as regular context until 200k tokens.

Additional details about token usage:

Topic		Replies	Views
How do cache reads and writes works? Discussions	1	327	August 17, 2025
Different token types How To	6	351	July 14, 2025
Cache Token Usage Discussions	1	86	July 18, 2025
How to disable Cache Write and Cache Read? Discussions	50	1066	July 29, 2025
Understanding LLM Token Usage How To	0	4584	July 20, 2025