However, this is an aggregate total for the entire turn (prompt + tool calls + responses).
What I actually need is the exact input token size of an individual request sent to the LLM at any specific step. Is there a way to expose or calculate this payload size so I can accurately track my context limit?
Hey, as of right now the SDK gives three token-related events, and none of them cover what you want:
turn-ended.usage is the aggregate for the whole turn, which you already know
token-delta is a heuristic running count during streaming (chars/4), fine for a progress bar but not good for tracking the limit
step-started / step-completed include stepId and duration, but no tokens
There’s currently no exact input token size for a single LLM request inside a multi-step turn in the SDK. This is a valid feature request, I’ll log it internally for the Async Agents team so they can add per-step usage to step-completed (or as a separate event). I can’t give an ETA.
As a temporary workaround to track getting close to 200K, you can treat inputTokens + cacheReadTokens + cacheWriteTokens from turn-ended as a lower bound for the context size at the end of the turn (the last step is usually closest to the peak). Not perfect, but it gives a rough boundary between turns.
I tried. If the turn involves many tool calls, the inputTokens + cacheReadTokens + cacheWriteTokens can be easily much larger than 200k, because each turn contains many LLM requests.
Fair point, my workaround doesn’t work here. In a multi-step turn, each LLM call re-sends almost the whole context, and the turn-ended total counts that as new tokens. So it can easily go over 200K, even if the actual peak context was lower.
Honest answer: with the current SDK, you can’t reliably calculate the context window between steps. We’d need per-step usage in step-completed (or a separate event), and that doesn’t exist yet.
I’ll log this as a feature request for the team. I can’t share an ETA. If there’s an update, we’ll reply in the thread.