Why cache read per request can exceed model context window in token usage envent

Where does the bug appear (feature/product)?

When I check my token usage, I found that, some request has a huge cache read size, which will be counted into total token, however the cache read size exceed the model’s context window, is that cache read necessary? is the cache read token all send to the model ? if not send to model which means they won’t cost credits ? can someone explain the relationship of cache read and token input and model input ?

Hey @Straka!

This post I published a little while ago might help you. make sense of it.