I would like to know how to use it to save more API, input? output? cache write or cache read? Which one is better? I noticed that in the model cost, the cache write of the Claude model is more expensive than input
Hey, good question. I’ll go step by step.
Why cache write costs more than input: that’s normal. Anthropic charges cache write at 1,25x the normal input price. But cache write happens only once, on the first request in a chat. All later requests in the same chat while the context doesn’t change are charged as cache read, which is about 0,1x of input. So caching saves money over time, it doesn’t waste it.
Example for Opus 4.6 per 1M tokens:
- Input: $5.00
- Cache write: $6.25 first request
- Cache read: $0.50 all later requests
- Output: $25.00
What you can do to save money:
-
Don’t jump between different parts of the code in one chat. When the context changes, the cache resets and you pay a new cache write. It’s better to finish one area, then move to the next.
-
Don’t leave the chat idle for more than about 5 minutes. Anthropic cache uses a 5 minute sliding window. If the chat cools down, the next request will be a full cache write again.
-
Check rules and MCP. If you have .cursorrules, .cursor/rules, or MCP servers enabled, they add context to every request, which increases the cache write size.
-
Model choice matters. On more expensive models like Opus, the absolute cache write cost is higher. For simple tasks, you can use Sonnet.
More details and calculations here: Why are cache read and write chargeable? Someone please explain - Why are cache read and write chargeable? and here: Understanding Write Cache Understanding Write Cache
Let me know if you still have questions.