Why cache read per request can exceed model context window in token usage envent

Straka · April 25, 2026, 9:34am

Where does the bug appear (feature/product)?

When I check my token usage, I found that, some request has a huge cache read size, which will be counted into total token, however the cache read size exceed the model’s context window, is that cache read necessary? is the cache read token all send to the model ? if not send to model which means they won’t cost credits ? can someone explain the relationship of cache read and token input and model input ?

Colin · April 27, 2026, 10:31am

Hey @Straka!

This post I published a little while ago might help you. make sense of it.

Topic		Replies	Views
Token Usage: Transparency Discussions openai	1	96	April 23, 2026
Cursor high token usage Help context , byok , large-codebases	8	398	April 21, 2026
Why I've paid 55k tokens on some cache read on new chat Help context	1	54	March 8, 2026
Someone please explain - Why are cache read and write chargeable? Discussions	5	567	March 5, 2026
How to reduce cache reads Help	7	1416	December 10, 2025

Why cache read per request can exceed model context window in token usage envent

Where does the bug appear (feature/product)?

Related topics