Share your Thoughts on Composer 2.5!

I’ve been using Composer 2.5 over the last 5 days, having given it some fresh projects to work on as well as participate in a more complex project repository. It’s been a pretty positive experience really. I’ve been extremely adverse to not using frontier models (my “Auto + Composer” pool never goes beyond a few % usage per month), but it looks like that moving forward I can start to utilize Composer more. Now is probably a great time for me to start putting Composer to work on more autonomous things like bug identification, refactoring opportunities, etc. With the Cursor SDK, it sounds like I might be able to do some really cool things now.

Well, Composer 2 only worked well for me as subagent under GPT-5.4, which sent it prompts that didn’t fit on the screen. So…

Is it really 90% off? Why did I still spend so much money?
Seems that cost = 0.5 x input + 0.3 x cached input + 2.5 x output, no 90% off, and the cached input price is too high.

been loving composer 2.5 so far; going ham with it on my box running a bunch of composer 2.5 subagents :smiley:

That’s included, so you are not paying that price. You can check the number of millions of tokens and % usage of your subscription to see that the prices are indeed 10X smaller.

For the past few days I had a very positive experience with Composer-2.5 (not fast). It would routinely knock out tasks on the first try. Make solid architecture decisions, and did well planning on its own or implementing a plan generated by another model (e.g. Claude). The ‘token usage’ appeared high in the usage dashboard (1M+ for most requests) but.. 90%+ of that is cache reads which seems reasonable.

Today I noticed what feels like a major regression.

My first thread of the day kicked off w/ cursor stubbing two function and not implmenting 99% of the (very detailed) spec I passed in.

I just had another experience (~30m ago) where it re-implemented a large portion of my app that was recently refactored to be very rock solid, had thorough documentation/test coverage. To try to summarize, i’ve got a feature that captures logs for my creative tool (player, scuplting, placement) and it added another namespace (input capture) but.. appeared to bypass most of my architecture. I’ve spet ~4 prompts trying to clean it up and get it to just.. implement what was spec’d and follow the existing architecture but i’m still finding divergence.

Not sure if something changed but i’m about to pause work today because I’m finding i’m fighting the tool more than making progress.

sorry that has been the case for you today @iolo, are you in privacy mode ? if not, would you mind sharing the request ID of the concerned chat so that team can look into it, and potentially see the exact regression you are talking about?

Hi there apologies i’m in privacy mode but I can share the request ID if helpful and an example of the quick back and forth of the first issue (the second one was sent via SDK and i don’t have persistence of the request currently).

I don’t log the cursor Requet ID unfortunately, i’ve just got a timestamp and the back-and-forth of prompts w/ the agent (e.g. first prompt, initial agent response, follow-up prompt, second agent response). It’s long but i can attach it if helpful.

It’s totally fine, no worries! and I guess yes, well, I won’t be able to analyse them directly, but if you send them, team will be able to iterate on the issue and maybe pinpoint the underlying issue!

Ok will do! Is there a place for me to send them beyond this (pretty active) forum thread?

If you read my experience in this thread, you either one-shot the first prompt or probably not.

If you end up in a convoluted discussion where things are not clearly working, ask for a prompt rewrite, and start a new chat, with that improved one-shot prompt.

I started using Composer 2.5 via the SDK in a project that is a chatbot with personality and decision-making (not OpenClaw) - it responds much more humanely than Qwen 3.6 Plus.

Before this, I was worried about how to set up the harness without overdoing the character traits, but after changing the model, it became perfect. :heart_eyes:

Used composer 2.5 a few days now for challenging webgpu coding tasks. Underwhelming. “Fixes” bugs by hiding symptoms. Generally seems to “think” less then other competing models, comes up with inferior half backed solutions. Has issues following code structure of existing project, spreads functionality and files partially correct, partially random. The percentage equality to Opus does imho not reflect reality.

Okay so Composer 2.5 is nerfed ?

Composer 2.5 was working great for past few days for highly complex task today it fails badly.

Using it today I’ve noticed it seems to miss or ignore overall architectural consideration even though I explicitly told it. When I ask it to implement stuff it does it half-assed - for example a config object could be empty, or a function’s implementation might not be fully implemented if at all.

I tried giving the same question and task a few times to Codex and it did a way better job.

second this

(post deleted by author)