Can anyone tell me how good Composer 2.5 is?

I dropped Composer 2 because it consistently failed to properly understand the visual screenshot requirements and often modified files that were outside the intended scope. In contrast, Sonnet 4.7, Opus 4.7, and GPT-5.5 do a much better job of understanding requirements and avoiding changes to important parts of the code that already work.

Yeah, I had a pretty similar experience with Composer 2 on larger codebases. It sometimes feels too eager to “help” and ends up touching files or logic that were never part of the request. That becomes especially risky when you already have stable parts of the app working.

For me, models like Anthropic Claude Sonnet/Opus and OpenAI GPT-5.5 tend to be much better at respecting boundaries and following the actual intent of the task, especially when working from screenshots or existing UI patterns.

That said, these Composer 2.5 benchmarks honestly look like a massive improvement over Composer 2, especially on harder real-world tasks. If they improved scope control and context awareness, that’s already a huge upgrade.

I haven’t used Composer 2.5 as much as the other models yet since it was released pretty recently, but I can already say it’s very smart, and now I’m using it for most tasks. For context though, I’m using it with Max mode enabled.

It is surprisingly solid for coding. My credits did reset 23rd of this month, but I’m still using Composer 2.5 instead going back to Opus 4.7. I like how fast Composer 2.5 is.

It is a slop machine but can get things done, as long as you dont care about quality of the code. It has the right thinking process → read files → read docs → edit files → run tests. But the code is uh-uh. Tested on large codebase. I also tried it on online research and it seemed bit better than previous version.

After using the Composer series across multiple releases, my conclusion is consistent: regardless of benchmark scores, these models are worker bees, not thinkers. So I’ve adapted my workflow accordingly.

I use Claude (Opus or Sonnet) for planning. There’s something about the frontier models that just gets the intent right, they have intuition about what you’re actually trying to build, not just the code. That makes them good for brainstorming, pinning down requirements, and producing detailed, well-structured plans. Once a plan is solid, Composer 2.5 handles implementation. It’s the hands, not the head.

GPT-5.5 sits somewhere in between. It’s not great at product intuition so I don’t rely on it for planning, but I’ll use it for complex implementation tasks where I don’t want to gamble on Composer. Every new Composer release is essentially a better worker for me. 2.5 is a step up from its predecessor in certain reasoning tasks, but not enough to trust it in the planning seat.

My workflow is that I use the expensive API models for planning, with TDD baked in and clearly scoped units of work. Then I delegate those units to Composer 2.5. Sometimes I still run a reviewer agent to catch slop. It takes longer, but I’m also optimizing for cost, tokenmaxxing isn’t a luxury for me by using frontier models for everything.

I think the reason frontier models win at planning isn’t just raw coding knowledge, it’s that planning involves product thinking, not just syntax. Composer models are developers. The frontier models are architects and product owners rolled into one. To be fair, Composer 2.5 is still much better than its predecessors, so for simpler tasks, I do let it plan neverthless.

Its benchmaxxed but it’s okay for being part of the plan and stuff. I would not pay additional API cost to use Composer though, no. But as part of Cursor yeah it’s a good improvement. The explore is wrong less often now too, before it could get a bit sloppy exploring.