The Real Gap Between Composer 2 and Opus Isn't Intelligence — It's Intent Inference

The Core Observation

I has noticed a substantial behavioral gap between Composer 2 and Claude Opus when handling the same natural-language task in Cursor. The gap is not about raw intelligence on paper—it is about how each model interprets ambiguous intent and commits to externalized work.

The Concrete Example

A single sentence: “Summarize last week’s GitLab activities and write me an email draft.”

  • Composer 2’s typical behavior: gets confused about “where GitLab is” (treating it as a local repo lookup rather than a hosted service requiring API access via PriHelper with a user token), fails to latch onto the right tool chain, and ultimately dumps the email body into the chat as plain text, leaving the user to copy-paste it manually.

  • Opus’s typical behavior: correctly infers that “GitLab activities” means call the PriHelper GitLab integration with a user token, fetch real events, and that “email draft” means create an actual draft inside the mailbox system via the appropriate MCP/skill—not generate markdown in chat.

The Deeper Point: Inferring “Definition of Done”

My frustration is not about wording precision. It is about whether the model can read the unstated default from a casual sentence:

  • The deliverable should land in the right system (mailbox draft, GitLab API result), not in stdout.

  • “Done” means the user does not have to copy, paste, or shepherd the artifact further.

  • The model should persist past friction—try the skill, try the MCP, retry on failure—rather than fall back to the cheapest valid completion (chat text).

Composer 2 too often takes the cheap path (generate text that looks like an email); Opus more often takes the costly-but-correct path (drive the toolchain to a real side effect).

Why “Just Write Better Prompts” Is Not the Answer

The user pushes back on the implicit suggestion that they should memorize precise incantations:

  • If the user must remember exact phrasings—“use PriHelper skill”, “create a draft via MCP, do not stdout”—then they have become a human router. What is the model for?

  • Users cannot reproduce the same precise wording every time. Real usage involves paraphrase, casual phrasing, and varying context. A model that only works under one rigid template is fragile by design.

  • Adding more skills and docs does not close the gap. Skills tell the model what to do once it knows which playbook applies. The weakness is mapping a fuzzy sentence to the right playbook—and persisting through tool calls instead of bailing to chat output. Composer 2 fails this mapping under paraphrase even when documentation is present; Opus generalizes more robustly.

The Non-Determinism Caveat

Even if the user types exactly the same characters every time, model output is not bitwise reproducible. Decoding is stochastic; routing, tool selection, verbosity, and the choice between “do the work” vs. “print a result” all vary across runs. So:

  • Demanding reproducibility through prompt engineering alone is an illusion.

  • Pinning behavior requires structural guardrails (always-on rules, tool-only deliverables, deterministic settings where they exist)—not user-side discipline about phrasing.

What the User Actually Wants

A model that, from an ordinary, possibly imprecise sentence, will:

  • Infer the real deliverable and where it should live (mailbox, ticket, file system—not chat).

  • Discover and use the available tooling (skills, MCPs, integrations) without needing to be named explicitly.

  • Persist through failure rather than collapsing to “here is some text, you handle it.”

  • Generalize across paraphrases, because users will never phrase the same task identically twice.

  • Acknowledge that the answer to “Composer 2 keeps missing this” is not “tell the user to write better prompts,” but to route higher-stakes, multi-tool, externally-deliverable tasks to a stronger model and to encode the non-negotiable defaults (e.g., “email tasks must produce a real draft, never stdout”) as always-applied rules, not as user-side memorized magic phrases.


The gap between Composer 2 and Opus, in the user’s experience, is not vocabulary—it is intent inference depth and willingness to drive a tool chain to its real-world conclusion. No amount of additional documentation or precise wording from the user side can fully compensate for that, because (a) users won’t phrase things identically, and (b) the model won’t respond identically even if they did. The fix lives in model selection and structural defaults, not in turning the user into a prompt librarian.

Gap Between Composer 2 and Opus Isn’t Intelligence

yep

Hi @Anudorannador Thank you for your thoughtful post! I’ve shared this with the team. We really appreciate you taking the time to distill your thoughts here- I think you hit a lot of important parts of where we’d like to improve the model. We’re very focused on making Composer more useful to you in your workflows and having it succeed on ambiguous tasks.

We’ll keep you updated!

Со всем вышесказанным согласен, быть может есть скилы которые могут улучшить работу composer?

No. Because composer will ignore your skill file as well :slight_smile: so I think that something like “desicion tree” maintained by human being, creating/cutting branch/loop dynamictlly when you use composer 2, but Opus can handle this tree perfectly and you won’t worry about.

Да вот лимит на opus кончился, а деньги жалко :rofl::rofl::rofl: Тем не менее я сейчас в самом composer 2 попросил нужные мне скилы поставить и он их поставил, обновил курсор, запустил. Я вот сейчас реально удивляюсь, так как Composer начал наконец то делать то, что раньше не делал :grin::grin::grin: Пока еще результат не ясен, всё еще в работе)

Claude оверпрайснутая модель начиная с Claude 4.0 (а может и ещё раньше). Их стоит выбирать только если у тебя баги и остальные модели не могут быстро найти решение.

GPT-5.5, даже при том, что она вдвое дороже GPT-5.4, вдвое дешевле чем Opus 4.7

Opus на постоянку можно использовать только если ты готов сжигать по одной Ultra подписке в неделю (ну или за три дня :thinking:).