Using the SDK, how can I check the context window usage?

Hi everyone,

I’m trying to monitor my 200K context window usage programmatically.

I know the turn-ended event provides a usage object:

{
  "type": "turn-ended", 
  "usage": {
    "inputTokens": 1235,
    "outputTokens": 10,
    "cacheReadTokens": 1230,
    "cacheWriteTokens": 0
  }
}

However, this is an aggregate total for the entire turn (prompt + tool calls + responses).

What I actually need is the exact input token size of an individual request sent to the LLM at any specific step. Is there a way to expose or calculate this payload size so I can accurately track my context limit?

Thanks!

Hey, as of right now the SDK gives three token-related events, and none of them cover what you want:

  • turn-ended.usage is the aggregate for the whole turn, which you already know
  • token-delta is a heuristic running count during streaming (chars/4), fine for a progress bar but not good for tracking the limit
  • step-started / step-completed include stepId and duration, but no tokens

There’s currently no exact input token size for a single LLM request inside a multi-step turn in the SDK. This is a valid feature request, I’ll log it internally for the Async Agents team so they can add per-step usage to step-completed (or as a separate event). I can’t give an ETA.

As a temporary workaround to track getting close to 200K, you can treat inputTokens + cacheReadTokens + cacheWriteTokens from turn-ended as a lower bound for the context size at the end of the turn (the last step is usually closest to the peak). Not perfect, but it gives a rough boundary between turns.

I tried. If the turn involves many tool calls, the inputTokens + cacheReadTokens + cacheWriteTokens can be easily much larger than 200k, because each turn contains many LLM requests.

Fair point, my workaround doesn’t work here. In a multi-step turn, each LLM call re-sends almost the whole context, and the turn-ended total counts that as new tokens. So it can easily go over 200K, even if the actual peak context was lower.

Honest answer: with the current SDK, you can’t reliably calculate the context window between steps. We’d need per-step usage in step-completed (or a separate event), and that doesn’t exist yet.

I’ll log this as a feature request for the team. I can’t share an ETA. If there’s an update, we’ll reply in the thread.