Agent hangs silently when extension host restarts during in-flight tool call

Where does the bug appear (feature/product)?

Cursor IDE

Describe the Bug

When the extension host restarts while the AI agent has in-flight tool calls, the agent hangs silently — partial output, no error message, no recovery except reloading the window.

The root cause is visible in remoteagent.log:

RequestStore#acceptReply was called without receiving a matching request 346
RequestStore#acceptReply was called without receiving a matching request 347

The old extension host had pending tool-call requests (IDs 346–350). On restart, the new extension host has no record of those IDs. When replies arrive, it logs “no matching request” and drops them. The cloud agent waits forever for responses that no longer exist anywhere in the system.

In our environment this produced 340 orphaned request warnings and 6 extension host restarts in a single day.

Related: This may be the root cause behind some reports in the “Taking longer than expected” thread ([Cursor 2.5/2.6] Taking longer than expected) — the symptom is identical (agent freezes mid-turn), but that thread focuses on network/indexing causes. This report identifies a different mechanism: extension host restart orphaning in-flight requests.

Steps to Reproduce

  1. Configure an MCP server that crash-loops (e.g., a stdio command that exits immediately with code 1 due to missing credentials).
  2. Open an agent chat and issue several tool calls. The crash-looping MCP destabilizes the extension host through a positive feedback loop — retry rate accelerates with MCP activity (from ~2 min apart at idle to 5–10 seconds under heavy use).
  3. When the extension host eventually restarts, any in-flight agent tool call hangs silently.
  4. Confirm by checking remoteagent.log for “RequestStore#acceptReply was called without receiving a matching request” at the timestamp of the hang.

Expected Behavior

On extension host restart, all pending agent tool-call requests should receive an error response (e.g., “Extension host restarted, please retry”) so the agent can recover or report the failure to the user. Currently they are silently dropped.

A secondary improvement: MCP servers that exit immediately should be retried with exponential backoff (e.g., 1s → 2s → 4s → … → 5 min cap), not on every MCP activity event. The current behavior creates a positive feedback loop that destabilizes the extension host.

Operating System

Linux

Version Information

Version: 2.5.26, Commit: 7d96c2a, Node.js: v20.18.2, Connection: Remote SSH

For AI issues: which model did you use?

claude-4.6-opus (bug is model-independent)

Additional Information

Log evidence

Extension host restarts (6 in one day)

From remoteagent.log:

17:35:53 Extension Host Process exited with code: null, signal: SIGABRT  ← OOM
19:23:25 Extension Host Process exited with code: 0, signal: null
21:48:32 Extension Host Process exited with code: 0, signal: null
22:40:08 Extension Host Process exited with code: 0, signal: null
23:36:59 Extension Host Process exited with code: 0, signal: null
00:06:24 (exthost6 started, RequestStore warnings for IDs 346–350)

Orphaned requests (340 total)

04:06:35 RequestStore#acceptReply ... without receiving a matching request 1
...
00:06:24 RequestStore#acceptReply ... without receiving a matching request 350

MCP crash-loop → connection drops (1:1 correlation)

Each crash-loop restart produced a pair of Connection closed errors exactly
~2.1 seconds later:

23:38:41.007 [info]  Handling CreateClient action  ← restart attempt
23:38:43.105 [error] McpError: MCP error -32000: Connection closed  ← 2.1s later
23:38:43.105 [error] McpError: MCP error -32000: Connection closed

42 such pairs in 14 minutes on one extension host.

Root cause analysis (from source)

The MCP layer is NOT the problem

The MCP protocol handler (cursor-mcp/dist/main.js) correctly handles
connection drops. The _onclose() method rejects all pending response handlers:

_onclose() {
  const t = this._responseHandlers;
  this._responseHandlers = new Map;
  this._transport = void 0;
  const r = new Error(ConnectionClosed, "Connection closed");
  for (const e of t.values()) e(r);  // All pending calls get errors
}

This is working correctly — the McpError: MCP error -32000: Connection closed
log entries ARE these rejections propagating.

The extension host restart IS the problem

The agent tool call path is:

Cloud agent → server-main.js (IPC) → Extension host → cursor-agent-exec → tool

When the extension host process exits, _cleanResources() fires _onClose and
kills the child process. But there is no mechanism to reject pending agent tool
call requests that were routed through the dying extension host’s IPC channel.

The remoteagent.log confirms this with 340 orphaned request warnings in
one day:

RequestStore#acceptReply was called without receiving a matching request 346
RequestStore#acceptReply was called without receiving a matching request 347
RequestStore#acceptReply was called without receiving a matching request 348

These fire when the new extension host receives replies for request IDs that
belonged to the old extension host. The old request IDs are gone — so the cloud
agent is waiting for responses that no longer exist anywhere in the system.

Suggested fix

In the extension host connection management layer (server-main.js), on
extension host exit:

  1. Track all in-flight agent tool call request IDs routed through the extension
    host IPC channel.
  2. On _cleanResources() / _onClose, reject all tracked requests with a
    retryable error (e.g., “Extension host restarted”).
  3. Ideally, re-route pending requests to the new extension host after reconnect.

This is analogous to how the pty host’s RequestStore has a 15-second timeout
on pending requests — the agent tool call pipeline should have a similar
safeguard, either via timeout or active rejection on channel close.

Does this stop you from using Cursor

Sometimes - I can sometimes use Cursor

Note that Cursor (Claude 4.6) believes it knows how to fix this bug, and estimates (!) that the bug should take less than 1 hour to fix.

Hi James!

Thank you for this exceptionally well-documented report. The log evidence, root cause analysis, and clear reproduction steps are exactly what helps our engineering team act quickly.

Both issues you’ve identified are real and are being actively worked on:

  1. Orphaned tool calls on extension host restart — You’re correct that there’s a gap where in-flight agent tool-call requests aren’t properly rejected when the extension host exits. Our team is aware of this class of issue and has been working on improvements to extension host resilience, including better error propagation on restart.

  2. MCP crash-loop feedback — The MCP reconnect infrastructure does have exponential backoff, but as you’ve identified, under heavy MCP activity the behavior can accelerate restarts rather than throttle them. This feedback loop is also being addressed.

For immediate relief, if you can identify and fix the crash-looping MCP server (the one exiting with code 1 due to missing credentials), that would break the feedback loop and significantly reduce extension host restarts. You can check which MCP server is failing by looking at the MCP output logs: View > Output, then select MCP Logs or MCP: from the dropdown.

When the agent does hang, Restart Extension Host from the Command Palette (Ctrl+Shift+P) is quicker than a full window reload.

I’ve flagged this with our engineering team, along with your suggested fix approach. We’ll follow up as improvements ship.

Thank you for the detailed update!

1 Like

Hey @James_Hunter!

The ‘Waiting for Extension Host’ / agent execution timeout issue has been addressed in a recent Cursor update. Updating to the latest version should resolve this.