Hey, a few others have mentioned similar things recently. There’s a related thread here: Opus 4.6 was fun for 1 week..... why?!
One thing to keep in mind: Opus 4.6 in Max Mode uses extended thinking, which means the model does extra reasoning steps before producing the final output. This naturally increases response time compared to non-Max or non-thinking variants. That said, if it feels noticeably slower than before, we’d want to look into it.
To dig deeper, I’ll need:
- Request IDs from the slow sessions. Top right corner of the chat > Copy Request ID. Even 2 to 3 examples would help.
- How long are the responses taking roughly? Are we talking 30s, 1min+?
- Are you seeing the same slowness with other models, for example Sonnet 4.6 or non-Max Opus?
A couple of things to try in the meantime:
- Start a fresh chat for new tasks instead of continuing long threads. Accumulated context can increase latency significantly.
- Try Opus 4.6 without Max Mode to see if the speed difference is drastic.
Share those request IDs and I can check what’s happening on our end.