Parallel subagents: next batch waits for entire previous batch to finish

Where does the bug appear (feature/product)?

Cursor IDE

Describe the Bug

When dispatching more subagents than the parallel limit allows (e.g., 5 items with max 4 parallel), the remaining subagents wait until the entire first batch completes before starting. If 3 of 4 subagents finish in 2 minutes but the 4th takes 10 minutes, the 5th subagent sits idle for 8 minutes despite 3 free slots.
This significantly increases total execution time for workflows with many items of varying complexity.

Steps to Reproduce

Instruct agent to dispatch 5+ independent subagents in parallel (max 4 per batch). First batch of 4 launches. Some finish early but the 5th only starts after all 4 complete.

Expected Behavior

Queued subagents should start as soon as a slot frees up, not when the entire batch completes.

Operating System

Windows 10/11

Version Information

Version: 2.5.20 (user setup)
VSCode Version: 1.105.1
Commit: 511523af765daeb1fa69500ab0df5b6524424610
Date: 2026-02-19T20:41:31.942Z
Build Type: Stable
Release Track: Default
Electron: 39.4.0
Chromium: 142.0.7444.265
Node.js: 22.22.0
V8: 14.2.231.22-electron.0
OS: Windows_NT x64 10.0.22631

For AI issues: which model did you use?

claude-4.6-opus-max
thinking
Enabled MAX mode

Does this stop you from using Cursor

No - Cursor works, but with this issue

Hey, I saw both of your threads, this one and the one about maximizing parallel dispatch. I get the issue. Right now subagents run in batches, not in a “a slot frees up, start the next one” way. That’s definitely inconvenient, especially when tasks vary a lot in runtime.

I shared this with the team. There’s no timeline yet, but your report helps us prioritize it.

I should have submitted this feature request earlier. I solved this locally by creating a pattern that I use. The below is a representation of that pattern, adapt it to fit your needs.

Hopefully the Cursor team can add this feature in as large complex task based orchestration takes much longer than it should.

Brian

Sub-Agent Orchestration Pattern (Continuous Dispatch)

What This Solves

Cursor’s default agent execution model is batch-and-wait: an orchestrator spawns up to 4 sub-agents as foreground tasks and blocks until all 4 complete before its next turn. Every batch is bottlenecked by its slowest task — all other slots sit idle once their tasks finish.

This pattern replaces batch-and-wait with a rolling window: maintain a fixed concurrency ceiling and fill each open slot the moment it becomes available, one slot at a time. Slots never idle while work remains.

A dependency extension further allows tasks to declare prerequisites. The queue becomes a DAG — tasks only become eligible when all their dependencies have completed. Slots are never held waiting for a blocked task when independent work exists.


Core Concepts

Background tasks (run_in_background: true) return immediately and write output to a transcript file. The orchestrator continues running rather than blocking. This is the prerequisite for rolling window behavior.

File-based coordination replaces in-memory state. Three files manage the queue lifecycle:

  • queue.txt — pending tasks, one per line
  • running.txt — currently in-flight task names
  • done.txt — completed task names (used for dependency resolution)

Completion signal — each sub-agent writes a result file to a known path on completion. The poll script checks for file existence. No transcript parsing, no API calls.

Pre-written scripts — all coordination logic lives in shell scripts written to disk before the loop starts. The orchestrator loop calls file paths, never constructs commands inline.


Scripts

All scripts operate on a shared directory. Replace /tmp/rwt with your working directory throughout.


poll.sh

Detects completions, logs them, updates running.txt and done.txt, and reports freed slot count. Run every N seconds inside the loop.

#!/bin/zsh
LOG=/tmp/rwt/timing.log
RUNNING_FILE=/tmp/rwt/running.txt
DONE=/tmp/rwt/done.txt
touch "$DONE"
[ ! -f "$RUNNING_FILE" ] && exit 0

FREED=0
while IFS= read -r task; do
  if [ -f "/tmp/rwt/${task}-result.txt" ] && ! grep -q "\[DONE\] ${task} " "$LOG"; then
    echo "[DONE] ${task} $(date '+%H:%M:%S')" >> "$LOG"
    echo "$task" >> "$DONE"
    sed -i '' "/^${task}$/d" "$RUNNING_FILE"
    echo "FREED: ${task}"
    FREED=$((FREED+1))
  fi
done < <(cat "$RUNNING_FILE")

echo "RUNNING: $(wc -l < $RUNNING_FILE | tr -d ' ') QUEUE: $(wc -l < /tmp/rwt/queue.txt | tr -d ' ') FREED: ${FREED}"

pop.sh

Scans the entire queue for tasks whose dependencies are satisfied, pops up to N of them, and returns their names. Blocked tasks remain in the queue untouched. With no dependencies, all tasks are immediately eligible.

#!/bin/zsh
N=${1:-1}
QUEUE=/tmp/rwt/queue.txt
DONE=/tmp/rwt/done.txt
touch "$DONE"

POPPED=0
REMAINING=()

while IFS= read -r line; do
  [[ -z "$line" ]] && continue
  task="${line%%:*}"
  deps="${line#*:}"

  if [ $POPPED -ge $N ]; then
    REMAINING+=("$line")
    continue
  fi

  READY=true
  if [ -n "$deps" ]; then
    IFS=',' read -rA dep_list <<< "$deps"
    for dep in "${dep_list[@]}"; do
      grep -q "^${dep}$" "$DONE" 2>/dev/null || { READY=false; break; }
    done
  fi

  if $READY; then
    echo "SPAWN_NEXT: ${task}"
    POPPED=$((POPPED+1))
  else
    REMAINING+=("$line")
  fi
done < "$QUEUE"

printf '%s\n' "${REMAINING[@]}" > "$QUEUE"

spawn-log.sh

Logs spawns with timestamps and adds task names to running.txt. Call this immediately before spawning the corresponding Task calls.

#!/bin/zsh
LOG=/tmp/rwt/timing.log
for task in "$@"; do
  echo "[SPAWN] ${task} $(date '+%H:%M:%S')" >> "$LOG"
  echo "$task" >> /tmp/rwt/running.txt
done

Queue Format

Each line is a task entry: task-name:dep1,dep2,...

Tasks with no dependencies have an empty string after the colon:

task-01:
task-02:
task-03:task-01
task-04:task-01,task-02
task-05:task-03,task-04

Queue order determines priority among ready tasks — pop.sh scans from top to bottom and pops the first N eligible entries it finds. Tasks can be placed in any order; blocked tasks are skipped and independent tasks further down the queue will be popped instead.


Sub-Agent Contract

Each sub-agent must write a result file to /tmp/rwt/{task-name}-result.txt on completion. This is the only completion signal the orchestrator uses. The file contents are arbitrary — presence of the file is what matters.

Minimal agent prompt:

START=$(date '+%H:%M:%S')
# ... do work ...
END=$(date '+%H:%M:%S')
printf "TASK: {task-name}\nSTART: %s\nEND: %s\n" "$START" "$END" > /tmp/rwt/{task-name}-result.txt

Orchestrator Loop

Setup

Run once before the loop starts:

mkdir -p /tmp/rwt
touch /tmp/rwt/running.txt /tmp/rwt/done.txt
echo "[TEST START] $(date '+%H:%M:%S')" > /tmp/rwt/timing.log

# Write queue.txt with task entries
# Write poll.sh, pop.sh, spawn-log.sh to disk
# chmod +x all scripts

Initial Fill

Pop up to MAX_CONCURRENCY ready tasks and spawn them:

/tmp/rwt/pop.sh 4   # outputs SPAWN_NEXT lines
/tmp/rwt/spawn-log.sh task-01 task-02 task-03 task-04
# Task × 4 (run_in_background: true)

Poll Loop

Repeat until RUNNING=0 and QUEUE=0:

# 1. Atomic poll + pop
RESULT=$(sleep 3 && /tmp/rwt/poll.sh)
echo "$RESULT"
FREED=$(echo "$RESULT" | grep "^FREED:" | wc -l | tr -d ' ')

# 2. If slots freed and queue has work, pop exactly FREED ready tasks
READY_TASKS=$(/tmp/rwt/pop.sh $FREED)

# 3. Spawn exactly what pop.sh returned — nothing more, nothing less
/tmp/rwt/spawn-log.sh {tasks from READY_TASKS}
# Task × N (run_in_background: true)

Termination

Loop ends when poll.sh reports RUNNING: 0 QUEUE: 0.

Handle the edge case where RUNNING=0 and QUEUE>0: all remaining tasks have unresolvable dependencies (deadlock or missing prerequisite). Log an error and halt.


Key Rules

pop.sh decides — the orchestrator never pre-computes. Run poll.sh to learn what freed. Run pop.sh to learn what to spawn next. Spawn exactly what pop.sh returns. Never guess, predict, or assume which task comes next.

Poll and pop are one atomic operation per turn. Extract the freed count from poll.sh output and pass it directly to pop.sh in the same response. Never split these across separate turns.

Spawn immediately after pop, in the same response turn. No deliberation between pop.sh output and Task calls. The output says which tasks — spawn those tasks now.

Scripts are pre-written. The loop body calls file paths. It never assembles commands inline, never builds strings to evaluate.

Never spawn more tasks than freed slots. pop.sh returns 0 to N tasks. Spawn exactly that count. If pop.sh returns fewer than expected (some tasks still blocked), that is correct — do not fill the remaining slots from your own assumptions.

Poll interval scales with task duration. Use 3–5 seconds for tasks taking under a minute. Use 10–15 seconds for tasks taking several minutes. A poll interval that is a large fraction of task duration wastes throughput.


Dependency Behavior

Fan-out: One task unblocks multiple downstream tasks simultaneously. All newly eligible tasks are returned by the next pop.sh call and fill slots immediately.

Fan-in (merge): A task blocked on multiple prerequisites stays in the queue until the last prerequisite completes. pop.sh skips it on every call until then, filling slots with independent work instead.

Independent chains run in parallel automatically. No special orchestrator logic is needed. Two chains with no shared dependencies will saturate available slots concurrently by default.

Queue order is the tiebreaker. When multiple tasks become ready in the same poll cycle, pop.sh returns them in queue order. Place higher-priority tasks earlier in the queue if ordering matters.


Choosing Between Batch and Rolling Window

Use batch-and-wait (default Cursor behavior) when:

  • Total task count is small (8 or fewer)
  • Task durations are uniform (variance under 2×)
  • Simplicity is preferred — no scripts, no polling required

Use rolling window when:

  • Total task count is large (10 or more)
  • Task duration variance is high — some tasks finish in seconds while others run for minutes
  • Tasks have dependency relationships that would cause idle slots under batch mode
  • Maximizing throughput is the goal

The practical signal: if you find yourself waiting on a single slow agent while other slots cleared minutes ago, rolling window will reclaim that idle time.

One adaptation I use is to include sub-agent in task name so the orchestrator knows what type of agent to spin up, it can provide task details and the correct sub-agent for that specific task.

Thanks for sharing this!

Since run_in_background: true started working reliably, I’ve been using a simpler approach:
I enable it on subagents that should run in parallel, so the orchestrator dispatches batch after batch without waiting for the previous one to complete.
Then in a dedicated step, I instruct the main agent to wait until all subagents are finished — and what I observe is that it loops, sleeping and checking until every subagent has completed, then continues.