Reducing cache write cost

I noticed its by far the most expensive part of my Claude API calls.

I mostly use it to plan, and i jump around to very different areas of the code?

Is there some strategy to reduce this?
It would be nice to be able to track what code is in the cache or not.

Hey, good question. Cache write costs with BYOK can really add up, especially with your usage pattern.

Here’s why it happens: Anthropic prompt caching works by matching prefixes. When you jump between different parts of the codebase, the context changes enough that the cache can’t be reused, so Anthropic writes the full context to cache again, and cache writes cost 1,25x the normal input price.

A few things that can help:

  1. Group work by code area. Instead of jumping around, try to finish work in one area before moving to another. That way the cache is more likely to be reused across requests.

  2. Use separate chats for different tasks, but don’t bounce back and forth. Stay in one chat until you’re done.

  3. Check your rules and MCP. If you have .cursorrules, .cursor/rules, or MCP servers enabled, they add extra context to every request. Less context means fewer cache writes when you switch areas.

  4. The model matters. Which Claude model are you using? Opus has much higher cache write costs in absolute terms because the base price is higher.

Sadly, there’s no way right now to see what’s in the cache. That’s a limitation of the Anthropic API since the cache is opaque to the client.

Let me know if you want more specific optimization tips.