What’s the point of multiple agents?

I have been coding with Cursor, creating plans and asking the agent to implement it.

Last week, the new update brought concurrent agents (selecting 1x, 3x) to accomplish task. But I don’t understand how it works.

I mean, I understand how it works. A couple days ago, I tested I tested with three different models implement a large plan. Sonnet was best, but I already knew that.

But, if I understood correctly, it will always work like that: all agents doing the same task.

How is this useful? Should I worry about coordinating their work?

4 Likes

Hey, great question. The multiple agents feature (1x, 3x) uses a Best-of-N approach - it runs the same prompt in parallel on multiple models, then you compare their results and pick the best solution.

How it works:

  • Each agent runs in its own isolated git worktree
  • All agents tackle the same task with the same prompt
  • You review the different approaches and click “Apply All” on your preferred result

The point: different models have different strengths. This helps you quickly see which model produces the best code for your task without running them sequentially.

You don’t need to coordinate their work - they operate independently on identical copies. Check the Parallel Agents docs for more details.

Let me know if this clears it up.

6 Likes

I also don’t see the value. Is anybody actually using this?

Maybe let some sloppers compete if one produces usable output? Just go straight to a better model for same price.

1 Like

Thank you, @deanrie

I did this with a couple of tasks (assigning multiple agents), but I don’t seem myself doing that again.

But if there was a way, after a plan has been revised, for us to split work between parallel agents that would be awesome.

If the plan is already built with the entire system in context, it’s highly parallelizable. There’s some work regarding imports and linting probably.

1 Like

Just a way to use more AI credits and a complete waste IMHO.

Choose a good model, stick with it. Done.

3 Likes

it could be better if we can use different prompts in parallel. So, it would make sense. Can you imagine a projet, where I divide in 3 parts and use 3 different (or the same) models to work with 3 different prompts? It would be more productive!! Don’t you think @deanrie ?

1 Like

Thanks for the feedback! You’ve identified a common sentiment - the Best-of-N approach (comparing same task across models) has a narrow use case.

Good news: there’s a community workaround for splitting different tasks across parallel agents. Check out this guide: 🚀 Cursor 2.0: Split tasks using PARALLEL AGENTS automatically in one chat! [How to setup worktree.json]

It uses a .cursor/worktrees.json setup script to automatically assign each agent a unique task number on launch. You can then prompt like: “Read your .agent-id file. Agent 1 does X, Agent 2 does Y…” and they’ll work independently on different subtasks.

Your feedback about wanting native task-splitting is noted - I can see this would be more useful than model comparison for most workflows.

4 Likes

@deanrie I think the use case for ‘Best-of-N’ would be used more, if the evaluation approach was different.

Current approach - evaluate and compare code:

I’m meant to read the code of each and compare.

This is difficult.

Different approach - evaluate and compare the UX:

Allow me to view and interact with a dev or built version of each, so I can compare the UX.

In my case, where I use nextjs and a stripe listener, this would involve running the following in each worktree and then allowing me to view them:

  1. npm run dev
  2. stripe listen (I can’t remember the exact command

This is much easier for me to compare on.

2 Likes

I think we don’t need Best of N , what we need is the only one perfect.

1 Like

This would be great if it worked for planning and asks. Shorter answers/results are easier to compare.

There is no easy way to review the work. How do I switch between trees and see the results in my localhost?

1 Like

I was also wondering what the use case is for this, but looking for a use case doesn’t always mean you’ll end up finding one.
Few days passed and then I had a very repetitive task.

It involved upgrading 150+ files, these needed some very tight instuctions:

  • Copy from folder X to folder Y (in the repo, so Git would show the diff)
  • Edit a few fixed placed texts/numbers
  • Perform a set of checks and then report back to me, the user.
    I instructed that after my review and some more manual actions (that wouldn’t be worth prompting, like some code actions etc) that I would provide the desired ‘final’ step.
  • This final step involved marking the file as:
    DONE/TODO/WONTFIX
    Each option was given a destination path to move/remove the file and I instucted the agent to prepare the commit message. (containing a devops number, the file name and the result (done/todo/wontfix).

These instructions were noted in a new Task123.md file in the .cursor/commands folder.
So all I had to do was open a new agent, type in /Task123, hit enter and the 1st agent would start, then I’d start a second, third, etc…
By that time I could review the file changes from the 1st agent and give it feedback to handle the files accordingly and provide me the commit message.

This 150+ repetitive task was a good use case for me, as the files did not depend on one another and there was a clear set of tasks to perform.
The 1st few runs I had to tweak the command a bit to make sure the agents new exactly what to do (and NOT do).

So, while looking for a use case, I did not find it, but when you are aware of the capabilities, you’ll find use cases along the road. But this will heavily depend on the project you are working on.
I’m loving alot of cursor features, you just have to be aware of their existance and use them when they fit the task.

1 Like

It’s to get different agents to do a particular task to find the best that would maybe solve a certain problem or implement a certain look.

I was already playing with this on codex, this was first originally possible on code in cloud mode before you guys are getting it on cursor.

So I see I understand it’s alot of copying and implementing that’s gonna be happening around especially if it makes sense and it’s logical which all of them would keep doing and implementing and growing better

Just beautiful exciting times mhen… competition grows quick and easy & life experience is better for everyone

1 Like

Exactly you’re unto something bro

I’ve been saying this for the longest guy

1 Like

I haven’t found the parallel agents feature to be useful for code production - I think subagents would be much better for that.

(Long explanation for why I think parallel agents aren't great for code production)

This is mainly because using parallel agents multiplies the amount of reviewing I have to do (+1 whole review session per agent, per message!), whereas in my normal workflow I usually just update my prompt or provide better/clearer context if something goes wrong with the initial output. In my experience, frontier models like GPT 5(.1) (codex) and sonnet 4.5 do what I want in most cases when given good enough context. Poor output is usually a result of either my poor input or the model making a wrong assumption - which can almost always be solved by having it ask you clarifying questions in thinking mode before planning. To me, this means that poor output is the result of human error more often than not (my ambiguous/inadequate input resulted in poor output); Parallel agents don’t help me with that. Instead of reviewing multiple attempts at the same solution, I’d rather put my time toward improving the context in the single thread I have (which saves time as the chat grows, because the agent incrementally gains a better and better understanding of what I want). In other words, when reviewing multiple solutions, I spend my time trying to determine whether the output I’m presented with happened to adequate by chance. On the other hand, when I actively try to improve context by updating a prompt, I’m actively increasing my chances of getting adequate output for the rest of the conversation. I could see parallel agents being a good option if you have a detailed spec/plan ready to go ahead of time though.

For me, the best use case for parallel agents is getting insights about some code or a system. The difference here is that I’m trying to learn/discover instead of produce; I don’t have a clear outcome in mind from the start, so having multiple perspectives at once is extremely helpful and actually saves me a lot of time. This is where the differences in models can really be made to shine.

Here are a few prompts I used recently this way:


Comparison; Thinking through the difference between the contents of two files in order to find a useful indicator. Even though all 4 models discovered the same indicator, they all presented the information to me in a different way. Having information explained from 4 different angles really helped me grasp it quicker.
The files `@1-before-check.html`  and `@example-invisible.html`  both contain the contents of a webpage displaying a ReCAPTCHA. The difference is that `@example-invisible.html` has an "invisible" recaptcha; This means its checkbox isn't shown. Your task is to carefully compare the recaptcha components in `@1-before-check.html` and `@example-invisible.html` to see if you can find some designator or indicator that determines whether a captcha is visible or "invisible". Maybe an element is missing, some CSS is different, an attribute determines it, or something else. Feel free to be creative in your analysis and exploration. Think through this until you can determine a clear indicator. If you think for a long time and aren't able to recognize any clear indicators, feel free to say so.

Report back your results in a clear, understandable format. Respond inline in this chat - don't create a new file for your report.

Bug finding: GPT 5.1, 5.1 codex, and sonnet 4.5 each go off on their own paths when tracing through a feature. Having 3 different 'sets of eyes' looking for potential problems was great. I no longer needed to launch separate chats for this, and having 3 explanations helped me discover and think through potential issues much faster.
{{long knowledge primer about a large set of changes I just made to an existing feature}}

Now walk yourself through the implementation one more time. Look carefully for any potential logic errors, bugs, flow issues, or discrepancies where the implementation differs from the expectations I've laid out throughout this session.

I imagine you could do the same for debugging, brainstorming, learning, and “having the model ask you questions” (for the purpose of discovery) before planning. Weirdly, the effectiveness of these use cases seems to line up with the question “would you assign multiple humans to this task?”.

  • “Does it make sense to make multiple developers implement the same thing at the same time?” Probably not.
  • “Is it more effective for multiple developers to try and identify bugs in the new implementation?”. If you have the manpower, yes - you’ll get way better coverage from having multiple perspectives. And with AI, we don’t have a “manpower shortage” issue.

I’m sure there’s at least one good use case for code generation itself here. I just haven’t found it yet, nor needed it with the way I work.

Ugh! Why are they working on features like this when the models can’t even follow a simple small change task like “make it smaller” in a gui-component.
If they really wanna help us users, they should stop fokusing on making them making complex task better…they should go back to making sure they can fix simple things first!

1 Like

Multiple agents is purely a revenue generation con-game for the newbie non coding users being onboarded by the millions

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.