What the heck with Grok 4.3 cache?

Max on. Privacy on. RID: 0f9096ec-c1c6-4c33-a018-53234ed4c2b3

After ten minutes of work, a quarter of the reading was counted as raw input, although the input was expected to be no more than 200k tokens.

Hi @Artemonim,

The numbers you’re seeing are expected for agent mode. The Context panel (160K) shows your current turn’s context size, but the usage dashboard shows cumulative totals across all LLM calls in the session.

In agent mode, every tool call is a separate LLM request that re-sends the full conversation context. Over a 10-minute session, that can mean dozens of individual calls. Most re-sent context hits the provider’s prompt cache (your 6.9M cache read), but some calls miss the cache when new tool results change the prompt enough that the cached prefix can’t be reused. Those misses account for the 2M uncached input.

So the math checks out: many calls times ~140K context each, with roughly 77% hitting cache and 23% as fresh input. Your observation that “a quarter was counted as raw input” is accurate and a normal cache miss rate for an active agent session.

@Colin wrote a good breakdown of how this works here: Why does Cursor consume an absurd amount of cache read tokens? (Post #24)

This situation only applies to Grok 4.3 - other models are cached correctly.

This may not need much correction, considering how rubbish the model is, but still.

You’re right that cache hit rates vary by provider. xAI handles prompt caching differently from Anthropic and OpenAI. Thanks for flagging the difference. I’ve shared this with the relevant team.

Is the model really this bad? (Appart from the cache issue thing) *Not tested yet

I regreting to waste money on caching issues to fully test it.

But when I recently ran the same task on GPT-5.1, 5.2, 5.4, Composer 2, and Grok 4.3, the GPT-5.4 evaluator rated Grok 4.3 better than 5.1 and 5.2, but Grok 4.3 decided to drop the task with a red CI — well, because “I completed the task, and refactoring wasn’t part of the deal.”

I also suddenly discovered /Canvas and Grok 4.3 edited them better than Composer 2, but again there were caching issues.

Overall, Composer 2 as a subagent under GPT-5.4’s detailed prompts is currently too good to have another cheap model.

I checked if the behavior changes if I disable privacy mode, and no - caching is still broken.

What about now @Artemonim

Well…

Oh well xp…