Extreme fluctuations in Composer 2 performance

Where does the bug appear (feature/product)?

Cursor IDE

Describe the Bug

Three days ago, Composer 2 was genuinely blowing me away. It was definitely outperforming GPT 5.4 and Opus.

Today I put Composer 2 to work and it feels like I am coding with Llama 4. It keeps going in circles, doesn’t do what I tell it, and does cheap duct-tape fixes on bugs.

I have a creeping suspicion this is because of the continued pre-training Cursor does on its model, pushing new versions every 5 hours. I can’t rely on a model if I have no idea if it will be the same tomorrow.

Of course I dont know if this is the reason, but it doesnt seem impossible.

Steps to Reproduce

Its an AI model so its non-deterministic.

Expected Behavior

I would like Composer 2 to perform more or less the same from day to day.

Operating System

MacOS

Version Information

Version: 3.2.11 (Universal)
VSCode Version: 1.105.1
Commit: e9ee1339915a927dfb2df4a836dd9c8337e17cc0
Date: 2026-04-24T14:36:47.933Z (4 days ago)
Layout: glass
Build Type: Stable
Release Track: Default
Electron: 39.8.1
Chromium: 142.0.7444.265
Node.js: 22.22.1
V8: 14.2.231.22-electron.0
OS: Darwin arm64 25.4.0

For AI issues: which model did you use?

Composer 2

For AI issues: add Request ID with privacy disabled

Request ID: 2440666d-1084-427e-9add-bca1c2c7b83c

Does this stop you from using Cursor

No - Cursor works, but with this issue

Hey, thanks for the report and the Request ID, I checked the session.

First, to clear up a myth: the Composer 2 model is not retrained every 5 hours. The checkpoint is static. Routing and infra can change sometimes, but the model itself is the same day to day.

What actually happened in your session:

  • A bug triggered where some non-English characters slipped through. This is an issue we were already tracking, looks like it regressed, I’ll pass it to the team.
  • The model returned an empty response a few times.
  • The session hit a usage limit, which could also affect the flow.
  • The conversation context was pretty large. In long sessions, quality drops a lot, it helps to split work into new chats.

So the feeling that the model was acting dumb in this case was not placebo, there were real issues. But it’s not a daily model drift, it’s specific session factors.

If it happens again, send the Request ID and I’ll take a look. There’s also a megathread for Composer 2 feedback if you want to add notes there: Share your experience with Composer 2!

I appreciate you looking into it. Thanks for clearing that up!