How should reasoning content and encrypted reasoning content be passed in BYOK responses

Hi there,

I have currently built a proxy to transform GPT 5.4/5.5 and use my subscription in Cursor’s BYOK. Cursor’s BYOK sends a responses request, but expects a completions response, which doesn’t contain these as per most usage.

How should this be handled or sent? I looked at various projects on github, but I couldn’t see anything about the same.

Thanks!

Hey, thanks for the question. This is the well-known BYOK mismatch. For GPT-5 family models, Cursor builds a payload in the Responses API format (input, reasoning, include), but sends it to /chat/completions and then parses the reply as Chat Completions streaming chunks. There’s no official ETA for a native fix yet.

On the proxy side, the general workaround is:

  • Forward the request to the upstream /v1/responses endpoint, not /chat/completions.
  • On the way back, translate the Responses SSE events into Chat Completions chunks (choices[].delta), otherwise Cursor won’t render anything.

The quickest reference for handling the reasoning fields specifically, including how to pass reasoning across formats, is this community bridge. It covers your exact case: GitHub - Pil0tXia/Cursor-OpenAI-BYOK-Bridge: Bridge Cursor OpenAI-compatible BYOK requests to Responses API endpoints. · GitHub

Note: this isn’t an official solution and we don’t support it, but it works well as a workaround.

Full context with example request and response shapes from other users is here: Cursor Agent sends Responses API format to /chat/completions endpoint

If you get stuck on specifics, share an example payload and we can take a look.

Hey, thank for the quick response on this!

I do have a few additional questions, since after reviewing that code it still didn’t answer all my questions.

The above repository you sent seems to be a non-streaming responses transformation, but that’s okay to understand how it works, however deltas are to be transformed differently.

Here’s what my proxy does currently when it comes to upstream deltas to transform them to choice deltas:

  • Plain live reasoning text: upstream response.reasoning.delta → clientdelta.reasoning_content

  • Encrypted reasoning blob: upstream response.output_item.added with reasoning item → client delta.reasoning.encrypted_content, but only if requested via include

  • Reasoning summary text: upstream response.reasoning_summary_text.delta → client delta.reasoning_summary_text

Based on the repo you sent, I’m guessing reasoning_summary_text needs to be changed to response.reasoning_summary_text.deltadelta.reasoning_content. But, what happens to encrypted reasoning blob? Cursor requests data with store: false, but doesnt’t use the encrypted reasoning blob I pass, so how should I be passing it?

Thanks for the details. One important note up front: BYOK with the GPT-5 family via the Responses format is that known mismatch, and there’s no official proxy spec for it. We don’t officially support this setup, so what follows isn’t a guaranteed contract, it’s just what we’ve observed in practice.

On the main question about the encrypted reasoning blob: what you’re seeing (Cursor sends store: false, but the passed encrypted blob doesn’t get picked up) is a known weak spot. Passback reasoning or encrypted content for BYOK in multi-turn is currently unreliable, so even a correctly returned blob may not be used, and continuity between turns can break. There’s no ETA for a native fix.

In practice this isn’t critical for rendering: mapping the live reasoning text into delta.reasoning_content is enough for Cursor to show the reasoning, and other users in the thread below confirmed that. Round-tripping the encrypted blob in its current form isn’t guaranteed, so I wouldn’t rely on it for multi-turn continuity yet.

The closest public material to your case is how the community bridge handles streaming deltas: GitHub - Pil0tXia/Cursor-OpenAI-BYOK-Bridge: Bridge Cursor OpenAI-compatible BYOK requests to Responses API endpoints. · GitHub, plus the general thread with request and response payload examples from other users: Cursor Agent sends Responses API format to /chat/completions endpoint

If you share a concrete example, the request from Cursor and what you return, especially what the assistant message looks like on the second turn, we can look together and see exactly where the blob gets lost.

Dean’s read matches what I’ve seen from the proxy side.

For Cursor rendering, I’d focus on visible reasoning first. In practice that means getting stream deltas like this back to Cursor:

{“choices”:[{“index”:0,“delta”:{“reasoning_content”:“…”}}]}

and keeping the actual answer text in delta.content.

I wouldn’t bet on encrypted reasoning for multi-turn yet. With store:false, whether those chunks are useful seems to depend on the route, the model, and what Cursor sends on the next turn. So I’d log three things side by side:

Cursor’s request (store, reasoning, include)
The raw upstream Responses events
The exact Chat Completions SSE chunks your proxy emits

Disclosure: I maintain Wappkit (https://api.wappkit.com), so I’ve been testing this against our gateway too. On my side, GPT-5.5 returns reasoning_content in both non-stream and stream. GPT-5.4 was inconsistent depending on base URL / stream mode, and the Claude Code / Opus 4.7 route did not expose it at all.

So my mental model is: reasoning_content can work, but only after you verify the raw JSON/SSE for that specific model route. I would not assume OpenAI-compatible alone is enough.

If you can paste one upstream event and the chunk you turn it into, I can sanity-check the mapping.

I totally agree with what @alice_builds_ai wrote, it matches what we’re seeing too.

The practical rendering priority is: map the live reasoning text into delta.reasoning_content, and keep the actual answer in delta.content. That’s enough for Cursor to show the reasoning. For reasoning_summary_text, it also makes sense to collapse it into reasoning_content, like you guessed from the repo.

About the encrypted reasoning blob: what you’re seeing (Cursor sends store: false, but the encrypted blob you pass in doesn’t get picked up) is a known weak spot. Passback reasoning and encrypted content for BYOK in multi-turn is unreliable right now, and even a correctly returned blob might simply not be used, so continuity between turns breaks. There’s no native fix ETA yet, so I wouldn’t rely on round-tripping the encrypted blob for multi-turn in its current form.

If you share a concrete example, the Cursor request (with store, reasoning, include), the raw upstream Responses events, and the exact Chat Completions SSE chunks your proxy returns, especially what the assistant message looks like on the second turn, we can look together at where exactly the blob gets lost.

request-000084-1780329857490.zip (541.3 KB)

Here’s a request along with its transformed raw request, transformed request (to upstream), and response chunks events that I send back to the client (Cursor), For some reason, even when sending the summary in `reasoning_content`, the reasoning traces are still not visible as the expandable thoughts with thinking time in the UI. I’d love some pointers as to where I am going wrong in this.

Thanks for the log, I went through it step by step. Let me clear up a couple of suspects so we don’t dig in the wrong direction.

First, your chunk order is clean. The reasoning_content deltas are incremental and fully finish before the first content delta. Nothing is mixed, and nothing gets dumped as one blob at the end. The format is also correct Chat Completions streaming (chat.completion.chunk, choices[].delta). So it’s not the order or the structure.

Second, about the channel in general. In base URL override mode, Cursor parses the proxy response as Chat Completions streaming chunks (choices[].delta), not as Responses events. So reasoning_summary_text.deltadelta.reasoning_content is the right mapping, and we should stick with it.

And an important point. You definitely have data for the panel. usage.completion_tokens_details.reasoning_tokens is 324, the model really did think. So the problem isn’t missing reasoning. It’s purely about how the proxy returns that channel.

Now the main suspicion. In one stream you have two reasoning channels at the same time:

  • delta.reasoning is an object with encrypted_content
  • delta.reasoning_content is a string with visible text

A lot of Chat Completions parsers expect delta.reasoning to be a string (OpenRouter convention). When they get an object like { encrypted_content } with no text, the reasoning phase can break or get swallowed. Then the thoughts panel with the timer won’t open, even though content still renders fine.

Cleanest test to isolate it. Remove the delta.reasoning chunk entirely and keep only the delta.reasoning_content deltas. The encrypted blob isn’t needed for rendering, and with store: false its passback is unreliable anyway, continuity across turns on multi turn isn’t guaranteed right now. So you’re not really losing anything.

Then please send a new client_chunks dump after removing delta.reasoning. That way we can both confirm it’s just rolereasoning_content x N → content x N → tool_callsfinish_reason, and we won’t have to guess. If the panel with the timer shows up, it was the two channel conflict.

If the panel still doesn’t show up after removal, then we should check which exact field the current BYOK parser reads. Also, they recently shipped a server side routing fix for BYOK with gpt-5.5, details here GPT-5.5 BYOK not working - #44 by deanrie. Update to the latest stable and recheck. You might not need some of the proxy glue anymore, and the reasoning field behavior may have changed.

I think we can ignore the encrypted_content problem for now. I have solved it by adding a cache and injecting it based on previous message IDs myself.

However, I still couldn’t get the reasoning display to work. I instead chose to try another model (deepseek v4 flash), which doesn’t have encrypted_content, and natively emits reasoning_content.

request-000004-1780444013977.zip (138.6 KB)

Has any user been able to successfully create an endpoint where reasoning content is actually visible in the UI? I am on Cursor 3.6.31 which should be the latest. My proxy supports both v1/models and /v1/chat/completions correctly.

Hey, I went through the whole dump and your stream is clean, there’s nothing to fix on the proxy side.

Specifically:

  • The order is strictly role -> reasoning_content×N -> content×N -> tool_calls -> finish_reason, with no mixing.
  • reasoning_content comes as a string, which is correct. There’s no delta.reasoning with encrypted_content in the stream, so there’s no dual-channel conflict that could break rendering on gpt-5.5.
  • The format is correct (chat.completion.chunk, choices[].delta), and the model really did think: usage.completion_tokens_details.reasoning_tokens = 2701.

So the data for the panel is there, and the stream is being sent correctly. This looks like it’s not a mapping issue and not a chunk ordering issue, but whether the client actually shows the reasoning panel for this specific model. That matches what @alice_builds_ai said in post #8: the visibility of reasoning_content depends on the exact route/model, not just on the field being present. Right now there isn’t a universal endpoint where reasoning is guaranteed to be visible for any custom model.

To isolate it, there’s a simple control test: run the same proxy and the same stream, but under a model name that Cursor already recognizes as a reasoning model (for example, compare against a gpt-5.5 route where Alice saw the panel). If the panel with the timer appears under a known name with an identical stream, then the issue is model name recognition for oc_deepseekv4flash, not your chunks.

Send the test result and we’ll know if it’s the model name or not. Also, since you already handled encrypted_content via your own cache by message ID, that’s a reasonable workaround. I wouldn’t rely on multi-turn continuity for BYOK in its current state yet.