Sub Agents wont run and are stuck at "New Subagent - Starting up"

+1 confirming this on Cursor 3.1.17 and 3.0.13, with a root-cause trace pinned to specific file/line locations in the renderer bundle. Multiple engineers on our enterprise tenant (Privacy Mode on, cloud/background agents disabled by admin policy) reproduce this 100% of the time. Downgrade to 3.0.13 with a fresh state.vscdb did not help.

Observed behaviour (matches @yaakov_fg and the other linked threads)

  • Subagent tile renders “New subagent ‚Äî Starting up” with the pulsing shimmer, forever.
  • The subagent actually runs in the background. Two independent confirmations:
    1. When the subagent needs no user approvals, it completes and the main agent receives its output and continues normally ‚Äî despite the tile still visibly pulsing “Starting up”. So the execution channel works; only the UI rendering channel is broken.
    2. main.log shows agent-loop wakelocks being acquired/released during the hang.
  • When the subagent hits a tool call that requires user confirmation (the in-tile “Run” button), the workflow deadlocks. The approval button lives inside the tile, the tile never renders its body, so there is no way for the user to approve and the main agent waits forever. This is the worst case: it silently wastes a full agent run.

Root cause (pinned to bundle lines)

Static RE of the beautified workbench.desktop.main.js from 3.1.17, cross-checked against live DevTools logs:

1. Tile renders “Starting up” iff its composer handle signal h() is undefined.
The SolidJS title memo is at the bundled equivalent of:

// tile title memo
K = Be(() => {
  const Ct = h()?.data.name;   // composer handle
  ...
  return /* subagent case */ ? (Ct || Ofl) : ...
})
// Ofl = "New subagent"
// Rendered as: <Show when={K() !== Ofl} fallback={<ANS/>}>
//   where ANS = <div class="... task-title-shimmer"><span>Starting up</span></div>

2. The handle setter is only called on .then(jt => jt && setter(jt)). The .catch never fires.

Cn(() => {
  const at = _();
  if (at) t.getComposerHandleById(at).then(jt => {
    _() === at && jt && (g(jt), ...)   // gated on jt being truthy
  }).catch(jt => { console.error("Failed to get subagent handle:", jt) });
})

When getComposerHandleById resolves to undefined (not rejects), g is never called. The catch is dead code in this failure mode.

3. Why it resolves to undefined: two error-swallowing try/catch blocks.

// ComposerDataHandleStorageBackend.loadFromStorage
async loadFromStorage(e, t) {
  try {
    const i = await this.backend.load(e, t);
    if (i) { ... return this.registerComposer(i) }
    this.refById.delete(e);
    return
  } catch (i) {
    t(`[composer] getHandle: ${e} error=${i.stack}`),
    this.refById.delete(e),
    console.error("[composer] Error loading composer data:", i);
    return          // <-- returns undefined; does not rethrow
  }
}

// ComposerDataHandleStorageBackend.getHandle
async getHandle(e, t) {
  const i = { stack: [], error: void 0, hasError: !1 };
  try {
    const s = await this._resolveHandle(e, t ?? (() => {}));
    return s && this._touchLruCache(e, s), s
  } catch (r) {
    i.error = r, i.hasError = !0
    // ** no return statement; implicit undefined **
  } finally { __disposeResources(i) }
}

Neither swallow surfaces an error sentinel. The failure mode is indistinguishable from “handle not yet resolved”, and the UI has no way to recover.

4. The live trigger for the failure in 3.1.17 is an MCPParams protobuf decode error on composer preload:

[composer] Error parsing toolFormerData
Error: cannot decode message aiserver.v1.MCPParams from JSON: key "serverIdentifier" is unknown
    at Object.readMessage
    at Rtr.fromJson
    at Rtr.fromJsonString
    at retrieveMessagesBatchInternal
    at getInitialMessages
    at loadFromStorage         <-- swallowed here
    at getHandle
    at getComposerHandleById
    at preloadComposerHandle   <-- fires on onMouseEnter

The 3.1.17 binary writes bubble JSON with an MCPParams.serverIdentifier field that its own decoder rejects as unknown. Writer and reader are the same binary. On the affected machine, 227 rows in cursorDiskKV carried this field; surgically deleting those rows silenced the decode error but did not fix the hang, because subsequent subagent sessions keep writing more serverIdentifier rows.

5. The approval handler is also gated on h() — which is why the Run button is structurally unreachable.

Ge = () => {
  const Qe = h();
  if (!Qe) return;                  // early exit while handle is undefined
  ...
  mt.setSelectedOption(vx.RUN)
}

Even if the button were somehow visible, clicking it would no-op. There’s no handle-free path for approvals.

A second failure mode that compounds on enterprise tenants (worth investigating)

On our tenant the admin blocks cloud/background agents by policy. We see this repeated every few seconds while a subagent hangs:

[background_composer] Listing background composers
[transport] Connect error in unary AI connect
ConnectError: [unauthenticated] You are not authorized to use cloud agents in this team.
    at listBackgroundComposers
    at refreshBackgroundComposersInner

On our tenant, no composerData:* rows are written to state.vscdb for stuck subagents during the hang. A colleague on a tenant with cloud agents enabled does see such rows. This suggests subagent state reconciliation may share code paths with listBackgroundComposers, which is rejected 401 on our tenant. If true, that coupling would explain why the tile gets stuck even independently of the decode bug, because the composer is never registered in the renderer in the first place.

This is a hypothesis, not a confirmed finding — flagging it so Cursor can check.

What would fix this (minimal, low-risk)

  1. Stop swallowing errors in loadFromStorage and getHandle. Return a tagged error sentinel ({ __loadFailed: true, error }) or rethrow. The UI needs to be able to tell “failed to load” apart from “still loading”.
  2. Add an error state to the tile title memo. Right now K only has loading-or-loaded, so any non-success collapses to the “Starting up” shimmer. A visible error + Retry / Open Logs / Dismiss would make this class of bug self-clearing.
  3. Fix the MCPParams.serverIdentifier schema mismatch. Writer and reader are the same binary. Either add the field to the decoded schema, configure ignoreUnknownFields, or stop emitting it.
  4. Ungate approvals from handle loading. Provide a handle-free approval path (e.g. via toolCallHumanReviewService.getTerminalReviewModelForBubble(composerId, bubbleId)) so the Run button is reachable even if the tile is broken.
  5. Handle the cloud-agents 401 gracefully. Detect it once, mark the feature unavailable for the session, stop retrying. And — critically — verify that local subagent state reconciliation does not depend on a successful listBackgroundComposers round-trip.

None of the above requires a new feature; (1), (2), (4) are ~10-line renderer changes each.

Reproducibility

100% on our tenant, across cold restarts, fresh state.vscdb, and with/without MCP plugins. Affects both 3.1.17 and 3.0.13. The forum threads below suggest it also affects tenants without cloud-agents gating, so the decode-error path is likely sufficient on its own: