Share your experience with Composer 2!

Kimposer 2.5, BTW.

As a test of the model, I had composer 2 vibe code a system for the home that would replace google, amazon and all other CIA data collection points. It has a search engine, cloud storage (custom ceph like system capable of k/v store and block store aggregating unused storage on lan), email, sms/IM, AI gateway & training support, rag support, one-click adding models from HF catalog, and so much more. It mostly one-shotted it (if you can consider multiple executions on a single plan of action items to be a one shot).

I was pretty impressed that it was able to do it and honestly expected a hot pile of garbage at the end of it. Problems did not start until the code base got too large. I started running into weird quirks that I had come to expect from later gpt4 and early gpt5 models when dealing with large code base. When executing followup planned action items the composer 2 model would quickly forget what it’s done, what it needs to do, why it needs to do it, and how to do it. There are a few times when it needed to reason better and it felt like I was talking to someone without much experience as it would overlook obvious issues and had “oh, duh.” moments when I had to explicitly tell it what to do and how to do it.

Nesting the ability to switch to the slower version of composer 2 (with a hidden hover element) after intentionally having it be two separate models in the selector is actually the most dark pattern ■■■■ you guys have done yet. you need to seriously re-evaluate what the hell.

I thought at least Cursor was genuinely building a brand new model themselves, and mainly was disappointed by their false marketing by comparing with Opus 4.6 and GPT 5.4. Yesterday night I had really bad experience with their setup wiping out all my chat history across all my projects (user vs system wide installation). After seeing all this, and Kimi K 2.5 model in the back burner (it might be good model, but it is the Cursor which exposed it as Composer model and comparing with frontier models), I lost trust in Cursor now. Until now I used multiple Ultra plans to cope up with my usage. Cursor is genuinely good LLM agent, it is fast compared to other agents even with the same model (as they index your entire code base), and orchestrates models really well. But with two massive blows for me within 24 hours, I am not going to renew my subscriptions anymore with Cursor. Going to the source of Opus models after this.

I think it’s pretty cool that they further post-trained this model. The situation is resolved, and Moonshot is also happy to see how they further enhanced it.

We end up with an optimized model for Cursor tasks at a competitive price.

This would have hardly been possible without a SOTA Chinese model—that’s the beauty of open source.

That is super nice. I wish Cursor would have put exactly like that and marketed as such, then I would have just continued with their Super Ultra Subscriptions.

I came just for the gossip, but they already told me, lol. This whole thing about picking up models is complicated. I came just for the gossip, but they already told me, lol. This whole thing about picking up models is complicated.

Only tried it for a few prompts but sadly not impressed. It made a few very basic logic coding mistakes and that’s new for me. I rarely had to correct composer 1.5 in that sense and absolutely not in par with Opus or Sonnet. Felt like I had to instruct a very stupid person. I will not give up on it though, as I’ve been a huge fan of composer 1 and 1.5 and its speed. Been using 1.5 for 99% of my tasks.

For details I asked it to create a function that formats numbers so they become 999, 1.00k, 10.0k, 100k and so on. Then asked it to use the same function for distances so that it would become 10.00km for kilometers. But it just created anther suffix on top of it so the result became 10.00kkm. I mean, that’s pretty stupid.

Unfortunately really bad. Doesn’t follow instructions well*, forgets about directions, outputs a LOT of content but all very “solves the issue”-ish. The implementation always technically works but to the detriment of readability. It doesn’t follow codebase style even when explicitly instructed and goes on “refactoring” side-quests that leave the code worse than before. This is not just a fluke; I have encountered this exact issue in EVERY session except for small, focused UI changes, which it seems great at. But I cannot use it for anything serious or more involved than a couple lines.

*Technically it follows instructions “overly well” - it focuses entirely on what I said and takes it at face value. It doesn’t model intuition at all. If I say “remove this because it’s ugly and doesn’t fit well with the style of the rest of the codebase” it’ll either just remove it, but not offer an alternative solution, or immediately jump into implementing an alternative that’s just as bad.

I tried to get it to implement a plan 4 times. All 4 times it didn’t implement the plan fully. I then asked what was going on and got a response saying "You’re right – the plan wasn’t implemented fully. Want me to go ahead now?” only for that run to also fall short of implementing the plan. (I went through the entire process again, NOT reusing the same plan .md file or anything like that. This is not a prompting issue, and there’s no faulty plan here.)

The performance is great, though, and for small focused UI changes it’s awesome.


EDIT: I’m not sure whether this is user error or not, by the way - these are my first impressions. It’s really overzealous. But maybe I need to adjust the way I write instructions etc. It’s a very different feeling model for sure.


EDIT 2: it is really, really good at focused UI changes and writing. It writes copy in a way Opus can’t. Leads me to think that maybe I just need to learn the ropes and ask it for smaller, incremental changes. It definitely is way too overzealous, but with smaller changes, it may work out well

Agree, posted to gratulate them “They did it” and it was just same old RLing. Bummer.

So I was working with 1.5 for a very long time and my few lines impression is the following: the composer 2 feels smarter compared to 1.5, but same time I think I have to chat with 2 way more often to get the desired output, that wasn’t happening with 1.5 for sure.

kimi2.5?

I’m reasonably impressed with Composer 2.

Today’s widely reported revelations that it’s based on Kim K2.5 is not something that worries me from a tech perspective.

I’m glad that Cursor are building on top of other available tech technologies, rather than trying to reinvent the wheel. We’ve already seen that even organisations like Apple don’t have deep enough pockets to build their own frontier models.

And for me, curses moat is about the way it contextualises and routes to those models.

But what worries me is the lack of transparency about this? All the announcements about a fantastic new proprietary model make me wonder what other announcements aren’t entirely being honest with us.

You’re surprised by a company being dishonest?

Maybe I’m looking at this incorrectly but – wouldn’t it be unreasonable to expect them to have their base own model? I had assumed that they were just doing supplemental training to make existing models work better with the IDE + maybe some extra coding specific training runs.

A better marketing model might be a + and then they could release dozens of models that have gone through additional cursor training… GPT-120B-OSS+Cursor, Nemotron-3-300B-OSS+Cursor, etc. It differentiates but allows people to continue using the personalities and quirks they’ve grown accustom to.

I had expected that given how often Jensen Huang mentions cursor that they got pre-release access to nemotron-3 and used that.

BTW, Cursor Team has finally fixed the formatting of Kimi’s responses in Cursor :D

I’m not sure that Composer 2 is a good update of Composer 1 - 1.5. I’m using Cursor IDE for embedded C++ development for ESP32. I don’t use it for vibe coding. I’m using LLM as a development patner. Early, I was using the model’s escalation path: GPT 5.1 mini → Composer 1 - 1.5 → Sonnet 4.6. After updating Composer 2, I found that for some issues, GPT 5.1 mini is better than Composer 2. At least, GPT provides more relevant solutions. A few times, Composer 2 provides a stupid, erroneous solution. I have tried both standard and fast variants and can’t find any cardinal differences. I’m going to test it yet another 1-2 days, but the preliminary decision is to switch off the Composer 2 and keep Composer 1.5 as long as possible.

I used it for exploring flows and cascading issues. It has no ability to contextualise the purpose of the code and cascading effects. When I asked it to create a detailed implementation plan based on the findings, it could not do it. It created a plan, but very less detailed than the assessment. The reports I get back are incomprehensible. So I am going back to 1.5 for debugging and planning and using 2 solely for implementation

I’ve tried using Composer since version 1 and have literally decided to stay away from this model.

The model’s vision is simple: deliver the result as quickly as possible. In other words, it focuses on speed and sacrifices quality.

In terms of speed, it’s truly amazing. But in terms of quality, it simply delivers something that leaves much to be desired. Several times I asked for a simple implementation and it couldn’t understand, delivering something far from expectations.

Another point to note is that Composer simply ignores the rules configured in the Cursor. I have several rules to facilitate delivering the code as quickly as possible, and Composer simply disregards them and delivers what it wants. Other models apply everything configured in the rules, including AUTO.

I confess that spending millions of tokens to deliver bad code doesn’t make much sense. Other people on my development team frequently complain that Composer is very weak, has a terrible interpretation of problems, does everything wrong, and completely ignores instructions.

Composer 2 has an ugly tendency to modify generated files, a previous example was the re-write of the translation types, this example is small but the previous one was a 4.5k line edit, I’ve added a clause in AGENTS.md to guard against this but it still worries me I’ll have to add clauses for each generated file in my project.

It also tends to keep using bad patterns in the project, I would like if it proposed changes more, so far I keep going for GPT-5.4 for planning and Composer 2 for execution, except for critical features or big changes, it just tends to drift, works better for focused changes.