Hey, good question. Cache write costs with BYOK can really add up, especially with your usage pattern.
Here’s why it happens: Anthropic prompt caching works by matching prefixes. When you jump between different parts of the codebase, the context changes enough that the cache can’t be reused, so Anthropic writes the full context to cache again, and cache writes cost 1,25x the normal input price.
A few things that can help:
Group work by code area. Instead of jumping around, try to finish work in one area before moving to another. That way the cache is more likely to be reused across requests.
Use separate chats for different tasks, but don’t bounce back and forth. Stay in one chat until you’re done.
Check your rules and MCP. If you have .cursorrules, .cursor/rules, or MCP servers enabled, they add extra context to every request. Less context means fewer cache writes when you switch areas.
The model matters. Which Claude model are you using? Opus has much higher cache write costs in absolute terms because the base price is higher.
Sadly, there’s no way right now to see what’s in the cache. That’s a limitation of the Anthropic API since the cache is opaque to the client.
Let me know if you want more specific optimization tips.