Cursor 2.2: Multi-Agent Judging

New in Cursor 2.2! · Full changelog · Main announcement


When running multiple agents, Cursor now automatically evaluates all runs and recommends the best solution.

How it works

After all parallel agents finish, Cursor evaluates each solution and picks a winner. The selected agent gets a comment explaining why it was chosen.

This helps when you’re exploring different approaches to the same problem. Instead of manually comparing outputs, you get a recommendation with reasoning.

Judging only happens after all parallel agents have completed.

We’d love your feedback!

  • Is the reasoning offered by the “judge” agent helpful?
  • Does this change how you use parallel agents? Are you more likely to use them?
  • What improvements would you suggest?

If you’ve found a bug, please post it in Bug Reports instead, so we can track and address it properly, but also feel free to drop a link to it in this thread for visibility.

10 Likes

Glad this was added! I commented to Cursor on X that I’d love to see this taken a step further and have the judge actually review each of the independent implementations and pick and choose pieces from each. There’s no reason to think that Plan A would be better than Plan B across all dimensions. It would be great to critique each one and come up with a “best of both worlds approach”, especially if we do this during the planning phase prior to implementation.

3 Likes

It doesn’t work when Bedrock is toggled on.

I do the following:

Run the same task with 2–3 LLMs and save the results in separate MD files. Then, run a strong LLM on those results, asking it to extract the best parts from all of them.

This approach seems to work better than that multi-agent judging.

1 Like

Hey @altrusl , could you share your prompt for this?

I nerver used Multi-Agent mode , I always choose the best agent model .

4 Likes

This is really interesting, but how do you set this up/do this? I don’t see any controls or options. Would be great in the chat window to run a prompt and instead of enter be able to just a alternate key to launch the prompt against multiple models all at once. (maybe this is hidden somewhere?)

1 Like

Same question. How does one do multi agents ?

This is great. Could you share a bit more what reasoning is used to evaluate the different approaches, Colin?

E.g., does it look at scope of changes (and prefers minimal change?), coding style, solution design, maybe cost of the model (so we learn for the next time), …

1 Like

I often make a plan with my main agent. Then review by 2 or 3 other agents if am not fully sure and give all feedback back to main agents to pick wat is valid and best. So please make a multi review judge pick best option, and apply this also for multi implementation judge pick best. For example even when Opus comes up with a goo plan, almost always Codex or Gemini has some useful feedback on it. Now I have to do to much copy page action, other option would be to drag other agent chat into you main chat like you can with files

That’s a great new feature. I do something similar but more manual with codex, four implementations/four PRs, then give an AI parameters for judging.

I also sometimes orchestrate different agents pick apart and debate prospective approaches, using gh issue threads

Hey @charles and @ymoisan

Great question, and one I had myself trying it out today.

After some time, a +1 appears on the chat that has been chosen as the best.

We are aware there are some improvements to be made here. I’ve sent this feedback on (along with my own)

The judge analyzes the logic behind each proposed solution and explores the codebase to confirm they’re correct. It doesn’t specifically optimize for code size, style, architectural choices, or cost. :slightly_smiling_face:

If you disagree with the reasoning (“best” solution can be a little subjective), we’d love to hear feedback!

1 Like

No, more basic @Colin , how do I even use multiple agents? I see no way to use this or tell an agent to create multiple sub agents. The basic functionality is hidden?

1 Like

Glad to see what the app is doing after I made it

You should see a 1x appear next to the model name if you move your mouse into the chat window (just try hovering over the model name, you should see it appear)

1 Like

Where does the bug appear (feature/product)?

Cursor IDE

Describe the Bug

The multi-agent judging not shown. It does work on my Mac but not on Windows.

Steps to Reproduce

Just run any multi-model agent.

Expected Behavior

Show the multi-agent judging thumb-up icon in one of the model.

Screenshots / Screen Recordings

Operating System

Windows 10/11

Current Cursor Version (Menu → About Cursor → Copy)

Version: 2.2.20 (user setup)
VSCode Version: 1.105.1
Commit: b3573281c4775bfc6bba466bf6563d3d498d1070
Date: 2025-12-12T06:29:26.017Z
Electron: 37.7.0
Chromium: 138.0.7204.251
Node.js: 22.20.0
V8: 13.8.258.32-electron.0
OS: Windows_NT x64 10.0.26200

For AI issues: which model did you use?

Composer1 and Grok Code

For AI issues: add Request ID with privacy disabled

7ad8c2bd-62a3-4538-87f3-3a356eca1311

Does this stop you from using Cursor

No - Cursor works, but with this issue

1 Like

I hit this too - for me it was just a UI refresh issue. Reloading the app (or restarting Cursor) made the multi-agent judging panel show up again. Also worth double-checking you’re on the latest version, since this seems a bit flaky right now.

Nope, no such pulldown ?

A few issues with the process of multi-agent workflow.

  • sometimes they generate wildly different file names and structures
  • sometimes they generate the same filename
  • comparison is difficult for a human reviewer
  • presenting the result from one model to other models is difficult as the changes flow only back into the original folder
  • sometimes the models have different ideas, findings and knowledge, it is hard to manually steal their homework and share it between the models so they can copy and remix
  • sending a new query to all the agents at once after the initial query, is unstable and difficult
  • what happens to the forks after we close the chat session, are they going to use up space forever?