Context
We’re building automation on top of cursor agent --print --output-format stream-json --stream-partial-output and need to correctly parse the output.
Problem
In a single session, assistant events appear in at least 4 different forms, but the documentation only describes one:
// Form 1: no model_call_id, has timestamp_ms
{"type":"assistant","message":{"content":\[{"type":"text","text":"正在读取..."}\]},"timestamp_ms":1774846267524}
// Form 2: has model_call_id, has timestamp_ms — same text as Form 1
{"type":"assistant","message":{"content":\[{"type":"text","text":"正在读取..."}\]},"model_call_id":"a594...","timestamp_ms":1774846267693}
// Form 3: has timestamp_ms, small delta text fragments
{"type":"assistant","message":{"content":\[{"type":"text","text":"RCH\]\\n\\n\`offbo"}\]},"timestamp_ms":1774846272384}
// Form 4: no timestamp_ms, full cumulative text of the entire response
{"type":"assistant","message":{"content":\[{"type":"text","text":"<entire response here>"}\]}}
The documentation only says:
Concatenate all
message.content[].textvalues to reconstruct the complete response.
Following this literally produces duplicated text, because Form 1 & 2 carry the same content, and Form 4 repeats everything again.
Questions
-
Is there an official parsing reference or sample parser for
--stream-partial-outputthat handles all these forms? -
What is the intended semantic for each form? Specifically:
-
Should Form 1 (no
model_call_id) be skipped in favor of Form 2? -
Should Form 4 (no
timestamp_ms) always be skipped? -
Are there other forms we haven’t encountered yet?
-