I’m using GPT-5.5 High in Cursor Version: 3.8.11 (Universal). Since sometime yesterday, the model started taking an extremely, extremely long amount of time to generate tokens in some contexts. For example, I observed a patch section being generated at less than one token per second - literally, waiting a few seconds, seeing a token come through, waiting a few more seconds, seeing another token come through.
I would estimate that it’s a 100-times slowdown versus normal token generation. This is making Cursor unusable.
Interestingly, thinking tokens sometimes seem to come through at full speed.
Steps to Reproduce
Choose gpt-5.5 High in Cursor, ask it to work on a decent sized codebase, observe extremely slow token generation during code writing.
Expected Behavior
I expect tokens to be generated at the typical rate.
The pattern you’re describing - reasoning streaming at normal speed but the code/patch output crawling to ~1 token/sec, and appearing suddenly - usually points to a network or connection issue between your machine and our servers rather than the model itself. This doesn’t appear to be a widespread issue on our end right now, which fits that.
A few things to try, in order:
Run Network Diagnostics: Cursor Settings → Network → Run Diagnostics. This tests your connection to our servers and flags anything affecting streaming. (Network troubleshooting)
If you’re behind a corporate proxy or VPN: in the same Network settings, set HTTP Compatibility Mode to HTTP/1.1 and restart Cursor. Some proxies throttle or buffer HTTP/2 streaming, which slows token-by-token output specifically.
Isolate the network: try a different connection for a few minutes (e.g., a phone hotspot). If the slowdown disappears, that confirms the bottleneck is the network path.
If it’s still slow after that, grab a Request ID from one of the slow chats (open the chat → “…” menu → Copy Request ID) and paste it here so we can trace the timing on our side. Connectivity issues are debuggable even with Privacy Mode on. (Finding a request ID)
GPT-5.5 also has a higher-throughput “Fast” option in the model picker (at a higher per-token cost), but if the root cause is the network path that won’t make a difference - so I’d start with the diagnostics above.