Backgrounded terminal hangs in IDE

Where does the bug appear (feature/product)?

Cursor IDE

Describe the Bug

Title: Integrated terminal / agent shell bootstrap can hang forever on snap=$(command cat <&3) — child cat blocks on read(fd 3), parent zsh blocks on readoutput; user command never executes

Environment: macOS 26.3.1 (arm64), Cursor (extension host spawning zsh wrapper).

Symptom: Terminal jobs that should finish in milliseconds run indefinitely. User sees a stuck shell; the actual command (e.g. zqk object update …) never starts.

Evidence (sample stacks):

Child /bin/cat: all samples in read() — blocked reading stdin (fd 3 in cat <&3).
Parent /bin/zsh: all samples in readoutput → read() — blocked collecting command-substitution output from the cat child.
Parent process is Cursor Helper (Plugin) — consistent with the sandbox/bootstrap wrapper.
Root cause (inferred): The file descriptor passed as fd 3 for the “environment snapshot” (or equivalent) never reaches EOF and may never deliver a complete payload, so cat never terminates and zsh never leaves command substitution. There is no timeout on this handshake, so the hang is unbounded.

Suggested fixes:

Hard timeout on the snapshot read phase (e.g. 2–5s wall clock): if cat <&3 does not complete, kill the child, close fds, and surface a clear error (“sandbox snapshot handshake timed out”) instead of hanging forever.

Guarantee writer-side closure: Ensure the extension host always closes the write end of fd 3 on success and on all error/cancel paths (crash-safe teardown), so read() cannot block forever waiting for EOF.

Size / chunk contract: If the snapshot is large, document a finite length prefix or framed messages so the reader knows when to stop without relying on a stuck peer.

Telemetry: Log time-to-first-byte and time-to-EOF on fd 3; flag runs where TTFB > threshold or EOF never arrives (this bug).

UX: If the handshake fails or times out, do not leave the terminal session wedged; reset or offer “Retry terminal bootstrap.”

Severity: High — data loss of user time, blocked automation, false impression that user tools (e.g. CLI) are broken when they never ran.

-------------- process sample ------------------------------
Analysis of sampling cat (pid 92093) every 1 millisecond
Process: cat [92093]
Path: /bin/cat
Load Address: 0x1024bc000
Identifier: cat
Version: 197
Code Type: ARM64E
Platform: macOS
Parent Process: zsh [92092]
Target Type: live task

Date/Time: 2026-04-07 20:58:38.598 -0700
Launch Time: 2026-04-07 20:30:45.886 -0700
OS Version: macOS 26.3.1 (25D2128)
Report Version: 7
Analysis Tool: /usr/bin/sample

Physical footprint: 1025K
Physical footprint (peak): 1025K
Idle exit: untracked

Call graph:
2601 Thread_1142245 DispatchQueue_1: com.apple.main-thread (serial)
2601 start (in dyld) + 7184 [0x18035dd54]
2601 ??? (in cat) load address 0x1024bc000 + 0x738 [0x1024bc738]
2601 ??? (in cat) load address 0x1024bc000 + 0xae8 [0x1024bcae8]
2601 read (in libsystem_kernel.dylib) + 8 [0x1806e5908]

Total number in stack (recursive counted multiple, when >=5):

Sort by top of stack, same collapsed (when >= 5):
read (in libsystem_kernel.dylib) 2601

Binary Images:
0x1024bc000 - 0x1024bd24f +cat (197) <77AA0C70-FE80-3990-B58A-0200FC65E3F2> /bin/cat
0x1802d0000 - 0x1803234bb libobjc.A.dylib (951.1) <9E66CFF2-3EBD-3242-8166-D5D0C204755B> /usr/lib/libobjc.A.dylib
0x180324000 - 0x180354bcb libdyld.dylib (1340) /usr/lib/system/libdyld.dylib
0x180355000 - 0x1803f4713 dyld (1.0.0 - 1340) <044CD67E-3A0A-3CA4-8BB3-A9687D5328FE> /usr/lib/dyld
0x1803f5000 - 0x1803f81d0 libsystem_blocks.dylib (96) /usr/lib/system/libsystem_blocks.dylib
0x1803f9000 - 0x18044e07f libxpc.dylib (3089.80.10) <3B264F68-2825-3790-8796-4B49581D6425> /usr/lib/system/libxpc.dylib
0x18044f000 - 0x18046f95f libsystem_trace.dylib (1815.80.2) <2CE5667A-97C5-35F3-8124-BB48185D0D3D> /usr/lib/system/libsystem_trace.dylib
0x180470000 - 0x18051cccf libcorecrypto.dylib (1922.80.7) /usr/lib/system/libcorecrypto.dylib
0x18051d000 - 0x180568c67 libsystem_malloc.dylib (792.80.2) <102061AD-AC62-30E9-A960-0CC2E38A2D5D> /usr/lib/system/libsystem_malloc.dylib
0x180569000 - 0x1805afe5f libdispatch.dylib (1542.0.4) <4C58AB31-F363-3E75-A8F8-302105812DBF> /usr/lib/system/libdispatch.dylib
0x1805b0000 - 0x1805b2feb libsystem_featureflags.dylib (101) <36BE75BA-0B53-3ACA-ABAF-C639D711C90B> /usr/lib/system/libsystem_featureflags.dylib
0x1805b3000 - 0x180635047 libsystem_c.dylib (1725.40.4) /usr/lib/system/libsystem_c.dylib
0x180636000 - 0x1806c8ea3 libc++.1.dylib (2000.67) <652836CA-32B1-3388-A72A-D6B90DDDA958> /usr/lib/libc++.1.dylib
0x1806c9000 - 0x1806e392f libc++abi.dylib (2000.67) <580BCE60-F6A6-3764-BFE9-57A0DB800C54> /usr/lib/libc++abi.dylib
0x1806e4000 - 0x18072049f libsystem_kernel.dylib (12377.91.3) <78EC33A6-6330-3836-8900-EB90836936E8> /usr/lib/system/libsystem_kernel.dylib
0x180721000 - 0x18072dacb libsystem_pthread.dylib (539.80.3) <0596A7B6-BCE2-3F06-A2E8-3EAAB5371ED8> /usr/lib/system/libsystem_pthread.dylib
0x18072e000 - 0x1807364af libsystem_platform.dylib (359.80.2) <62C9CD37-272D-3D2D-9A1C-6F4EF24F7EC7> /usr/lib/system/libsystem_platform.dylib
0x180737000 - 0x180766e23 libsystem_info.dylib (600) <05CCB208-0B54-38D8-B736-6C7043F446AA> /usr/lib/system/libsystem_info.dylib
0x1846e1000 - 0x1846eb397 libsystem_darwin.dylib (1725.40.4) /usr/lib/system/libsystem_darwin.dylib
0x184b52000 - 0x184b63ff3 libsystem_notify.dylib (344.0.1) <73D7BFA7-9433-3ECD-929F-06220C37F214> /usr/lib/system/libsystem_notify.dylib
0x186e2e000 - 0x186e48fdb libsystem_networkextension.dylib (2205.81.1) <645A9697-79B5-3E35-A642-5FFDD53CB6A7> /usr/lib/system/libsystem_networkextension.dylib
0x186ecc000 - 0x186ee3ff3 libsystem_asl.dylib (406) <2A30FE38-5014-334B-941B-579054EA8C1E> /usr/lib/system/libsystem_asl.dylib
0x188aa8000 - 0x188ab03b7 libsystem_symptoms.dylib (2158.80.11) /usr/lib/system/libsystem_symptoms.dylib
0x18ca72000 - 0x18caaa2bb libsystem_containermanager.dylib (725.80.5) <1DDF96F6-6A8A-33F5-8691-8D3356620687> /usr/lib/system/libsystem_containermanager.dylib
0x18dec8000 - 0x18decc66f libsystem_configuration.dylib (1385.80.4) /usr/lib/system/libsystem_configuration.dylib
0x18decd000 - 0x18ded36a7 libsystem_sandbox.dylib (2680.80.20) <8875D51A-D22D-3A6B-9A17-8D565C96262C> /usr/lib/system/libsystem_sandbox.dylib
0x18f466000 - 0x18f4691fb libquarantine.dylib (196.40.3) <84049E82-ACDC-39F3-A2D7-6ECB912B825A> /usr/lib/system/libquarantine.dylib
0x18fbf6000 - 0x18fbfd00b libsystem_coreservices.dylib (191.3.3) <94151A03-8600-3318-8719-103E2D5E83A5> /usr/lib/system/libsystem_coreservices.dylib
0x190150000 - 0x19018d8b7 libsystem_m.dylib (3309) <9B88F30F-0BEF-391B-A358-8D1E2B08CCBE> /usr/lib/system/libsystem_m.dylib
0x19018f000 - 0x190192537 libmacho.dylib (1030.6.3) /usr/lib/system/libmacho.dylib
0x1901ac000 - 0x1901b961f libcommonCrypto.dylib (600035) <04C03765-D34A-311F-AE1E-CFBB1E627E54> /usr/lib/system/libcommonCrypto.dylib
0x1901ba000 - 0x1901c3b0b libunwind.dylib (1900.125) /usr/lib/system/libunwind.dylib
0x1901cc000 - 0x1901d66ff libcopyfile.dylib (230.0.1.0.1) <4EBDFB06-9F51-3FEF-A4D8-B1AFB9D0E823> /usr/lib/system/libcopyfile.dylib
0x1901d7000 - 0x1901da95f libcompiler_rt.dylib (103.3) <55AFEF28-E352-38B8-ACD6-60B67AA83B68> /usr/lib/system/libcompiler_rt.dylib
0x1901db000 - 0x1901dfa1b libsystem_collections.dylib (1725.40.4) <17BD45DA-1D82-3A90-9F18-5D569A5693B5> /usr/lib/system/libsystem_collections.dylib
0x1901e0000 - 0x1901e34bf libsystem_secinit.dylib (168.40.2) <0928D7E4-5DB0-3B75-9DF5-08EC8E9C033B> /usr/lib/system/libsystem_secinit.dylib
0x1901e4000 - 0x1901e6b57 libremovefile.dylib (84) /usr/lib/system/libremovefile.dylib
0x1901e7000 - 0x1901e7f27 libkeymgr.dylib (31) <18ED97A3-6DD7-3EC2-AB38-E16A862BE1A7> /usr/lib/system/libkeymgr.dylib
0x1901e8000 - 0x1901f0f8f libsystem_dnssd.dylib (2881.80.4.0.1) /usr/lib/system/libsystem_dnssd.dylib
0x1901f1000 - 0x1901f60b3 libcache.dylib (95) /usr/lib/system/libcache.dylib
0x1901f7000 - 0x1901f8cf3 libSystem.B.dylib (1356) <44B9EC51-9AC9-3D3B-A7F8-DEB02133130A> /usr/lib/libSystem.B.dylib
0x27e771000 - 0x27e778319 libRosetta.dylib (367.3) <5315C55D-09E5-394B-A9E5-672B23847BF7> /usr/lib/libRosetta.dylib
0x27fe81000 - 0x27fe84deb libsystem_darwindirectory.dylib (122) <9AD833C7-D009-39FF-8D04-D57C937D9EAC> /usr/lib/system/libsystem_darwindirectory.dylib
0x27fe85000 - 0x27fe8e34b libsystem_eligibility.dylib (289.80.56) <0E973CB5-A124-3716-9841-1EA67809AF06> /usr/lib/system/libsystem_eligibility.dylib
0x27fe8f000 - 0x27fe96873 libsystem_sanitizers.dylib (25) <90C45178-3954-3108-9F2F-44E528DF5E44> /usr/lib/system/libsystem_sanitizers.dylib
0x27fe97000 - 0x27fe97bb7 libsystem_trial.dylib (474.2.0.5.2) /usr/lib/system/libsystem_trial.dylib

--------------------------- process sample ------------------------------

Analysis of sampling zsh (pid 92092) every 1 millisecond
Process: zsh [92092]
Path: /bin/zsh
Load Address: 0x104f84000
Identifier: zsh
Version: 113.40.1
Code Type: ARM64E
Platform: macOS
Parent Process: Cursor Helper (Plugin) [1008]
Target Type: live task

Date/Time: 2026-04-07 20:57:48.116 -0700
Launch Time: 2026-04-07 20:30:45.882 -0700
OS Version: macOS 26.3.1 (25D2128)
Report Version: 7
Analysis Tool: /usr/bin/sample

Physical footprint: 1857K
Physical footprint (peak): 1857K
Idle exit: untracked

Call graph:
2571 Thread_1142243 DispatchQueue_1: com.apple.main-thread (serial)
2571 start (in dyld) + 7184 [0x18035dd54]
2571 zsh_main (in zsh) + 1276 [0x104fb4c58]
2571 init_misc (in zsh) + 152 [0x104fb4060]
2571 execstring (in zsh) + 132 [0x104f96ea8]
2571 execode (in zsh) + 196 [0x104f96f90]
2571 execlist (in zsh) + 676 [0x104f97260]
2571 ??? (in zsh) load address 0x104f84000 + 0x139f8 [0x104f979f8]
2571 ??? (in zsh) load address 0x104f84000 + 0x17d80 [0x104f9bd80]
2571 prefork (in zsh) + 460 [0x104fe7d14]
2571 ??? (in zsh) load address 0x104f84000 + 0x646c0 [0x104fe86c0]
2571 getoutput (in zsh) + 716 [0x104f992e0]
2571 readoutput (in zsh) + 284 [0x104f994ec]
2571 read (in libsystem_kernel.dylib) + 8 [0x1806e5908]

Total number in stack (recursive counted multiple, when >=5):

Sort by top of stack, same collapsed (when >= 5):
read (in libsystem_kernel.dylib) 2571

Binary Images:
0x104f84000 - 0x1050090e7 +zsh (113.40.1) <9DBCB17C-8EDD-3708-890F-A1C89F8A6C16> /bin/zsh
0x1802d0000 - 0x1803234bb libobjc.A.dylib (951.1) <9E66CFF2-3EBD-3242-8166-D5D0C204755B> /usr/lib/libobjc.A.dylib
0x180324000 - 0x180354bcb libdyld.dylib (1340) /usr/lib/system/libdyld.dylib
0x180355000 - 0x1803f4713 dyld (1.0.0 - 1340) <044CD67E-3A0A-3CA4-8BB3-A9687D5328FE> /usr/lib/dyld
0x1803f5000 - 0x1803f81d0 libsystem_blocks.dylib (96) /usr/lib/system/libsystem_blocks.dylib
0x1803f9000 - 0x18044e07f libxpc.dylib (3089.80.10) <3B264F68-2825-3790-8796-4B49581D6425> /usr/lib/system/libxpc.dylib
0x18044f000 - 0x18046f95f libsystem_trace.dylib (1815.80.2) <2CE5667A-97C5-35F3-8124-BB48185D0D3D> /usr/lib/system/libsystem_trace.dylib
0x180470000 - 0x18051cccf libcorecrypto.dylib (1922.80.7) /usr/lib/system/libcorecrypto.dylib
0x18051d000 - 0x180568c67 libsystem_malloc.dylib (792.80.2) <102061AD-AC62-30E9-A960-0CC2E38A2D5D> /usr/lib/system/libsystem_malloc.dylib
0x180569000 - 0x1805afe5f libdispatch.dylib (1542.0.4) <4C58AB31-F363-3E75-A8F8-302105812DBF> /usr/lib/system/libdispatch.dylib
0x1805b0000 - 0x1805b2feb libsystem_featureflags.dylib (101) <36BE75BA-0B53-3ACA-ABAF-C639D711C90B> /usr/lib/system/libsystem_featureflags.dylib
0x1805b3000 - 0x180635047 libsystem_c.dylib (1725.40.4) /usr/lib/system/libsystem_c.dylib
0x180636000 - 0x1806c8ea3 libc++.1.dylib (2000.67) <652836CA-32B1-3388-A72A-D6B90DDDA958> /usr/lib/libc++.1.dylib
0x1806c9000 - 0x1806e392f libc++abi.dylib (2000.67) <580BCE60-F6A6-3764-BFE9-57A0DB800C54> /usr/lib/libc++abi.dylib
0x1806e4000 - 0x18072049f libsystem_kernel.dylib (12377.91.3) <78EC33A6-6330-3836-8900-EB90836936E8> /usr/lib/system/libsystem_kernel.dylib
0x180721000 - 0x18072dacb libsystem_pthread.dylib (539.80.3) <0596A7B6-BCE2-3F06-A2E8-3EAAB5371ED8> /usr/lib/system/libsystem_pthread.dylib
0x18072e000 - 0x1807364af libsystem_platform.dylib (359.80.2) <62C9CD37-272D-3D2D-9A1C-6F4EF24F7EC7> /usr/lib/system/libsystem_platform.dylib
0x180737000 - 0x180766e23 libsystem_info.dylib (600) <05CCB208-0B54-38D8-B736-6C7043F446AA> /usr/lib/system/libsystem_info.dylib
0x1846e1000 - 0x1846eb397 libsystem_darwin.dylib (1725.40.4) /usr/lib/system/libsystem_darwin.dylib
0x184b52000 - 0x184b63ff3 libsystem_notify.dylib (344.0.1) <73D7BFA7-9433-3ECD-929F-06220C37F214> /usr/lib/system/libsystem_notify.dylib
0x186e2e000 - 0x186e48fdb libsystem_networkextension.dylib (2205.81.1) <645A9697-79B5-3E35-A642-5FFDD53CB6A7> /usr/lib/system/libsystem_networkextension.dylib
0x186ecc000 - 0x186ee3ff3 libsystem_asl.dylib (406) <2A30FE38-5014-334B-941B-579054EA8C1E> /usr/lib/system/libsystem_asl.dylib
0x188aa8000 - 0x188ab03b7 libsystem_symptoms.dylib (2158.80.11) /usr/lib/system/libsystem_symptoms.dylib
0x18ca72000 - 0x18caaa2bb libsystem_containermanager.dylib (725.80.5) <1DDF96F6-6A8A-33F5-8691-8D3356620687> /usr/lib/system/libsystem_containermanager.dylib
0x18dec8000 - 0x18decc66f libsystem_configuration.dylib (1385.80.4) /usr/lib/system/libsystem_configuration.dylib
0x18decd000 - 0x18ded36a7 libsystem_sandbox.dylib (2680.80.20) <8875D51A-D22D-3A6B-9A17-8D565C96262C> /usr/lib/system/libsystem_sandbox.dylib
0x18f466000 - 0x18f4691fb libquarantine.dylib (196.40.3) <84049E82-ACDC-39F3-A2D7-6ECB912B825A> /usr/lib/system/libquarantine.dylib
0x18fbf6000 - 0x18fbfd00b libsystem_coreservices.dylib (191.3.3) <94151A03-8600-3318-8719-103E2D5E83A5> /usr/lib/system/libsystem_coreservices.dylib
0x190150000 - 0x19018d8b7 libsystem_m.dylib (3309) <9B88F30F-0BEF-391B-A358-8D1E2B08CCBE> /usr/lib/system/libsystem_m.dylib
0x19018e000 - 0x19018ec9b libcharset.1.dylib (113) /usr/lib/libcharset.1.dylib
0x19018f000 - 0x190192537 libmacho.dylib (1030.6.3) /usr/lib/system/libmacho.dylib
0x1901ac000 - 0x1901b961f libcommonCrypto.dylib (600035) <04C03765-D34A-311F-AE1E-CFBB1E627E54> /usr/lib/system/libcommonCrypto.dylib
0x1901ba000 - 0x1901c3b0b libunwind.dylib (1900.125) /usr/lib/system/libunwind.dylib
0x1901cc000 - 0x1901d66ff libcopyfile.dylib (230.0.1.0.1) <4EBDFB06-9F51-3FEF-A4D8-B1AFB9D0E823> /usr/lib/system/libcopyfile.dylib
0x1901d7000 - 0x1901da95f libcompiler_rt.dylib (103.3) <55AFEF28-E352-38B8-ACD6-60B67AA83B68> /usr/lib/system/libcompiler_rt.dylib
0x1901db000 - 0x1901dfa1b libsystem_collections.dylib (1725.40.4) <17BD45DA-1D82-3A90-9F18-5D569A5693B5> /usr/lib/system/libsystem_collections.dylib
0x1901e0000 - 0x1901e34bf libsystem_secinit.dylib (168.40.2) <0928D7E4-5DB0-3B75-9DF5-08EC8E9C033B> /usr/lib/system/libsystem_secinit.dylib
0x1901e4000 - 0x1901e6b57 libremovefile.dylib (84) /usr/lib/system/libremovefile.dylib
0x1901e7000 - 0x1901e7f27 libkeymgr.dylib (31) <18ED97A3-6DD7-3EC2-AB38-E16A862BE1A7> /usr/lib/system/libkeymgr.dylib
0x1901e8000 - 0x1901f0f8f libsystem_dnssd.dylib (2881.80.4.0.1) /usr/lib/system/libsystem_dnssd.dylib
0x1901f1000 - 0x1901f60b3 libcache.dylib (95) /usr/lib/system/libcache.dylib
0x1901f7000 - 0x1901f8cf3 libSystem.B.dylib (1356) <44B9EC51-9AC9-3D3B-A7F8-DEB02133130A> /usr/lib/libSystem.B.dylib
0x190232000 - 0x1902391bb libiconv.2.dylib (113) <91928A6D-E098-3B69-B8BB-024C1AED3F71> /usr/lib/libiconv.2.dylib
0x198666000 - 0x19869b573 libpcre.0.dylib (21) <0305CDF1-06A2-3D61-A6F8-6C0F1207E75E> /usr/lib/libpcre.0.dylib
0x1b225d000 - 0x1b2299407 libncurses.5.4.dylib (79) <071DBDFA-3CF3-3310-AF0A-BD03F7D21C00> /usr/lib/libncurses.5.4.dylib
0x27e771000 - 0x27e778319 libRosetta.dylib (367.3) <5315C55D-09E5-394B-A9E5-672B23847BF7> /usr/lib/libRosetta.dylib
0x27fe81000 - 0x27fe84deb libsystem_darwindirectory.dylib (122) <9AD833C7-D009-39FF-8D04-D57C937D9EAC> /usr/lib/system/libsystem_darwindirectory.dylib
0x27fe85000 - 0x27fe8e34b libsystem_eligibility.dylib (289.80.56) <0E973CB5-A124-3716-9841-1EA67809AF06> /usr/lib/system/libsystem_eligibility.dylib
0x27fe8f000 - 0x27fe96873 libsystem_sanitizers.dylib (25) <90C45178-3954-3108-9F2F-44E528DF5E44> /usr/lib/system/libsystem_sanitizers.dylib
0x27fe97000 - 0x27fe97bb7 libsystem_trial.dylib (474.2.0.5.2) /usr/lib/system/libsystem_trial.dylib

Steps to Reproduce

ask the cursor agent to run some background process, eventually you’ll see it hang

Expected Behavior

it shouldn’t hang indefinitely

Operating System

MacOS

Version Information

Version: 3.0.12 (Universal)
VSCode Version: 1.105.1
Commit: a80ff7dfcaa45d7750f6e30be457261379c29b00
Date: 2026-04-04T00:13:18.452Z
Layout: editor
Build Type: Stable
Release Track: Default
Electron: 39.8.1
Chromium: 142.0.7444.265
Node.js: 22.22.1
V8: 14.2.231.22-electron.0
OS: Darwin arm64 25.3.0

For AI issues: which model did you use?

Auto

Does this stop you from using Cursor

No - Cursor works, but with this issue

Hey, awesome bug report. The process samples and root cause analysis really help.

You nailed the issue: cat <&3 in the shell bootstrap phase has no timeout, and if the write end of fd 3 isn’t closed by the extension host, the shell hangs forever. This is a known class of terminal hang issues, but your report is the first one that pinpoints it precisely to the fd 3 handshake.

I shared this with the team along with your analysis. No timeline yet, but your report helps us prioritize it. Your fix ideas also got passed along, like a hard timeout on the snapshot read, making sure the writer side always closes, and adding telemetry for TTFB and EOF.

For now, the workaround is to restart the agent session if the terminal hangs. If you can reproduce it reliably, let us know if there are specific conditions where it happens more often, like certain commands, session length, or how many terminals are open.

1 Like

Nice! I’ll keep it in mind and if I can spare the cycles to repro reliably I’ll do so. Cheers!

Had another incident occur so attempted to repro but ultimately hit a wall with whatever fd3 was waiting/hanging on. Issue is definitely something within the extension host realm that is piping bits to a stream and never sends EOF. Since the problem is intermittent, it’s likely some sort of race condition (perhaps under certain load scenarios where a stream is getting swapped in out of order, or an off-by-one situation. I’m able to reproduce the behavior in standalone fashion (that’s what’s included below - ala agent) but not able to reliably induce the behavior with the extension host and agent, but I gained some deeper insights while trying anyway.

Agent-exec / extension-host fd correlation — repro steps (scripts embedded)

Purpose: Correlate Cursor Agent shell activity with extension-host (agent-exec) filesystem syscall traces (fs_usage), and collect stack samples for the zsh snap=$(command cat <&3) path. macOS assumed (sudo, sample, lsof, fs_usage).

Working directory: repository root (the folder containing experiments/ and scripts/).


Reproduction procedure (summary)

  1. Host — Terminal.app (not the Agent panel): resolve extension-host (agent-exec) PID; run watch-eh-fd-once.sh (needs sudo for fs_usage). Output: experiments/cursor-fd64-harness/logs/fs_usage-once-<timestamp>.log.
  2. Agent — Cursor chat / agent terminal: while or near that window, run cat-loop.sh (or background bursts) so traffic goes through agent-exec.
  3. Host: run grep-fd-from-log.sh on the capture; look for F=64 vs F=40/F=65 (see harness README.mdfilesys often favors terminal fds; lsof fd 64 on EH is ground truth when unclear).
  4. Optional — Agent: run cursor-agent-exec-sandbox-probe.sh foreground (~2 min) to emit sample artifacts under $TMPDIR/zqk-cursor-probe-<pid>/.
  5. Optional — Terminal: standalone-snapshot-fd3-*.sh / fd3-snapshot-hang-mechanism-demo.sh — OS-only “no EOF” read blocks (no Cursor).
  6. Optional — External Terminal during a live hang: cursor-agent-exec-hang-live-diagnostics.sh with ZSH_PID= set to the wrapper shell (or use standalone-snapshot-fd3-blocker.sh for a practice ZSH_PID).

Note: Agent cannot run interactive sudo fs_usage reliably; host runs the watcher.


Embedded: harness data files

Save under experiments/cursor-fd64-harness/data/:

sample01.txt

alpha

sample02.txt

bravo

sample03.txt

charlie

Embedded: experiments/cursor-fd64-harness/scripts/cat-loop.sh

#!/usr/bin/env bash
# Loop: cat small files (for Cursor Agent or manual run).
# Run from repo root: bash experiments/cursor-fd64-harness/scripts/cat-loop.sh
set -euo pipefail
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
echo "cursor-fd64-harness cat-loop START $(date -Iseconds) ROOT=$ROOT"
for i in $(seq 1 "${1:-8}"); do
  for f in "${ROOT}/data"/*.txt; do
    echo "---- iter ${i} $(basename "$f") ----"
    cat "$f"
    sleep 0.15
  done
done
echo "cursor-fd64-harness cat-loop END $(date -Iseconds)"

Embedded: experiments/cursor-fd64-harness/scripts/watch-eh-fd-once.sh

#!/usr/bin/env bash
# Single sudo, one fs_usage run for DURATION_SEC (default 120). Easier than repeated slices.
# Usage: ./scripts/watch-eh-fd-once.sh
#   EH_PID=1184 DURATION_SEC=180 ./scripts/watch-eh-fd-once.sh

set -u
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
mkdir -p "${ROOT}/logs"
LOG="${ROOT}/logs/fs_usage-once-$(date +%Y%m%dT%H%M%S).log"
EH_PID="${EH_PID:-$(ps -ax -o pid=,args= 2>/dev/null | awk '/extension-host \(agent-exec)/ {print $1; exit}')}"
FD="${FD:-64}"
DURATION_SEC="${DURATION_SEC:-120}"

[[ -z "${EH_PID}" ]] && echo "Set EH_PID=" >&2 && exit 1

echo "One-shot fs_usage EH_PID=${EH_PID} for ${DURATION_SEC}s → ${LOG}"
echo "After run: grep 'F=${FD}' ${LOG}"
sudo fs_usage -w -f filesys -t "${DURATION_SEC}" "${EH_PID}" 2>&1 | tee "${LOG}"
echo "--- grep F=${FD} ---"
grep -E "F=${FD}([^0-9]|\$)" "${LOG}" | tail -50 || echo "(no lines with F=${FD})"

Embedded: experiments/cursor-fd64-harness/scripts/grep-fd-from-log.sh

#!/usr/bin/env bash
# Summarize fs_usage log lines for a given file descriptor (default 64).
#
# Usage:
#   ./scripts/grep-fd-from-log.sh logs/fs_usage-once-20260408T125319.log
#   FD=65 ./scripts/grep-fd-from-log.sh logs/fs_usage-once-*.log
#   ./scripts/grep-fd-from-log.sh   # uses latest logs/fs_usage-once-*.log

set -euo pipefail
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
FD="${FD:-64}"

if [[ -n "${1:-}" ]]; then
  LOG="$1"
else
  LOG=$(ls -t "${ROOT}/logs"/fs_usage-once-*.log 2>/dev/null | head -1 || true)
fi

if [[ -z "${LOG}" || ! -f "${LOG}" ]]; then
  echo "Usage: $0 [path/to/fs_usage.log]" >&2
  echo "  Or put files under ${ROOT}/logs/fs_usage-once-*.log" >&2
  exit 1
fi

echo "Log: ${LOG}"
echo "FD filter: ${FD}"
echo "=== lines with F=${FD} (word boundary) ==="
grep -E "F=${FD}([^0-9]|\$)" "${LOG}" || echo "(none)"
echo ""
echo "=== counts by syscall token (lines mentioning F=${FD}) ==="
grep -E "F=${FD}([^0-9]|\$)" "${LOG}" | awk '{print $2}' | sort | uniq -c | sort -rn || true
echo ""
echo "=== sample: other common fds in same log (F=40, F=65) for contrast ==="
grep -E 'F=40([^0-9]|$)|F=65([^0-9]|$)' "${LOG}" | tail -15 || true

Embedded: experiments/cursor-fd64-harness/scripts/watch-eh-fd-iterate.sh

#!/usr/bin/env bash
# Run watch-eh-fd-once.sh several times in a row (separate fs_usage captures, one sudo each).
#
# Default 10 × DURATION_SEC (120s) = ~20 minutes of capture plus sudo prompts — tune ITERATIONS
# and DURATION_SEC before a long run.
#
# Usage (from repo root or this harness dir):
#   ./scripts/watch-eh-fd-iterate.sh
#   ITERATIONS=3 DURATION_SEC=30 ./scripts/watch-eh-fd-iterate.sh

set -euo pipefail

ROOT="$(cd "$(dirname "$0")/.." && pwd)"
cd "${ROOT}"

ITERATIONS="${ITERATIONS:-10}"
DURATION_SEC="${DURATION_SEC:-120}"

EH_PID="${EH_PID:-$(ps -ax -o pid=,args= 2>/dev/null | awk '/extension-host \(agent-exec)/ {print $1; exit}' | tr -d ' ')}"
[[ -z "${EH_PID}" ]] && echo "No extension-host (agent-exec) found; start Cursor Agent or set EH_PID=" >&2 && exit 1

export EH_PID
export DURATION_SEC

printf 'EH_PID=%s  ITERATIONS=%s  DURATION_SEC=%s each (~%s s total fs_usage wall time)\n' \
	"${EH_PID}" "${ITERATIONS}" "${DURATION_SEC}" "$((ITERATIONS * DURATION_SEC))"

for ((x = 1; x <= ITERATIONS; x++)); do
	printf '\n======== iteration %s / %s ========\n' "${x}" "${ITERATIONS}"
	./scripts/watch-eh-fd-once.sh
	printf '---- COMPLETE iteration %s ----\n' "${x}"
done

Embedded: experiments/cursor-fd64-harness/scripts/watch-eh-fd.sh

#!/usr/bin/env bash
# Watch extension-host via fs_usage for activity on a given fd (default 64).
# Requires: sudo (repeated per slice — see README for credential caching). Run from external Terminal.
#
# Env: EH_PID FD SLICE_SEC DURATION_SEC — see README.

set -euo pipefail
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
mkdir -p "${ROOT}/logs"
LOG="${ROOT}/logs/fs_usage-$(date +%Y%m%dT%H%M%S).log"
EH_PID="${EH_PID:-$(ps -ax -o pid=,args= 2>/dev/null | awk '/extension-host \(agent-exec)/ {print $1; exit}')}"
FD="${FD:-64}"
SLICE_SEC="${SLICE_SEC:-2}"
DURATION_SEC="${DURATION_SEC:-180}"

if [[ -z "${EH_PID}" ]]; then
  echo "Could not find extension-host (agent-exec). Set EH_PID=." >&2
  exit 1
fi

echo "watch-eh-fd: EH_PID=${EH_PID} FD=${FD} SLICE_SEC=${SLICE_SEC} DURATION_SEC=${DURATION_SEC}"
echo "Log: ${LOG}"
_end=$((SECONDS + DURATION_SEC))

while [[ "${SECONDS}" -lt "${_end}" ]]; do
  echo "### slice $(date -Iseconds) ###" | tee -a "${LOG}"
  sudo fs_usage -w -f filesys -t "${SLICE_SEC}" "${EH_PID}" 2>&1 | tee -a "${LOG}" | grep -E "F=${FD}([^0-9]|\$)|pwrite|write\\(" || true
done

echo "Done. Full log: ${LOG}"
echo "--- lines mentioning F=${FD} ---"
grep -E "F=${FD}([^0-9]|\$)" "${LOG}" | tail -40 || true

Embedded: scripts/cursor-agent-exec-sandbox-probe.sh

#!/usr/bin/env bash
# Cursor agent-exec sandbox probe (manual / bug repro helper)
#
# PURPOSE
#   Exercise the same zsh wrapper Cursor uses for Agent terminal commands:
#     snap=$(command cat <&3); ... eval "$1" ...
#   Use this to compare successful runs vs hangs (sample zsh/cat PIDs while sleeping).
#
# HOW TO USE
#   1. In Cursor: run this as ONE Agent command (the whole script), NOT ./script from a normal terminal.
#   2. Optional: enable whatever "background" Agent mode you use when the bug appears.
#   3. Artifacts (macOS sample(1) + ps): printed as PROBE artifacts → TMPDIR/zqk-cursor-probe-<pid>/
#      — ps-tree-phase0.txt, sample-self-phase1-*.txt (this script’s PID), sample-wrapper-phase1-*.txt ($PPID zsh),
#        sample-grandparent-phase1-t8.txt, sample-child-*.
#   4. Optional manual Terminal.app while PHASE 1 runs:
#        ps -ax -o pid,ppid,lstart,command | grep 'snap=$(command cat'
#        sample <zsh_pid> 5 -file /tmp/zsh-success.txt
#   5. On hang: compare those stacks to success (read/readoutput vs nanosleep/sleep).
#
# EXPECTED WHEN SNAPSHOT SUCCEEDS
#   Early samples: zsh may still be in startup; then sleep(1) / wait — NOT stuck in readoutput.
#   If you never get past "PHASE 0", the hang is before your script (FD 3 / cat <&3) — same as known bug.
#
# FORENSICS: success vs Cursor FD3 hang vs unrelated “hang”
#   Success (wrapper zsh after snap closed): collapsed top often __sigsuspend; call graph includes
#     waitjobs → __sigsuspend, or bash child in __wait4 / sleep — not readoutput.
#   Real Cursor/agent-exec FD3 hang (user command never starts): wrapper zsh stuck in
#     prefork → getoutput → readoutput → read; immediate child cat(1) also in read (same FD pair).
#   Unrelated: e.g. diff(1) or a pipeline blocked in read on a pipe/FIFO — outer zsh may still show
#     waitjobs/__sigsuspend while the stuck process is a child (not the FD3 cat snapshot path).
#   Save bug-report samples: sample <zsh_pid> 5 -file .zqk/logs/cursor-agent-exec-compare/hang-<pid>-zsh.txt
#
# RULED OUT — narrows responsibility (not exotic OS / not “unsolved kernel”)
#   Stated so engineering time is not spent on dead ends that forensics already eliminate.
#
#   • Third mystery process / hidden FD3 contention: system-wide lsof on the unix NODE ids for the
#     hung pair showed only four rows — extension-host writer (e.g. fd 64), zsh fd 3, cat fd 0+3 dup.
#     No other process holds that socket pair; not a steal-by-fourth-party reader problem.
#   • Classic POSIX “lock” on fd 3: not how pipe/socket streams work; paired socket + dup’d cat fds
#     match normal bootstrap, not mutex-style deadlock between unrelated subsystems.
#   • “User command is slow”: hang stacks show readoutput/read before user eval runs; payload can be
#     trivial (e.g. short heredoc) and still never start — not workload-dependent.
#   • VS Code OSS missing piece: stock microsoft/vscode has no snap=$(cat <&3) path; fix is not
#     “wait for upstream VS Code merge” for this specific handshake — it is Cursor agent-exec + EH.
#   • Extension-host sample (hang): dominant stacks were uv__io_poll/kevent (event loop idle), not
#     write(2)/sendmsg stuck on the socket — consistent with “never completed write+close path” more
#     than kernel backpressure-only; still worth Cursor-owned logging/timeout on the snapshot writer.
#
# IN SCOPE FOR CURSOR (product / EH code)
#   Timeout + guaranteed close/shutdown of the writer side of FD3, telemetry (TTFB/EOF), and tests
#   that snapshot handshake always completes or fails fast — matches forum staff direction.
#
# WHILE HANG IS LIVE (external Terminal only)
#   scripts/cursor-agent-exec-hang-live-diagnostics.sh — multi-pass sample sequence + lsof + RSS;
#     detects drift (EH write vs kevent; zsh readoutput stable). Optional sudo: see
#     scripts/cursor-agent-exec-hang-live-diagnostics.sh header (dtruss/fs_usage macOS quirks).
#
# shellcheck disable=SC2034
set -u

PROBE_DIR="${TMPDIR:-/tmp}/zqk-cursor-probe-$$"
mkdir -p "$PROBE_DIR" || PROBE_DIR="${TMPDIR:-/tmp}"

probe_ps_tree() {
  local _out="$1"
  {
    echo "date=$(date -Iseconds) pid=$$ ppid=$PPID shell=$0"
    echo "=== self + parent ==="
    ps -p "$$" -o pid,ppid,user,lstart,command 2>/dev/null || true
    ps -p "$PPID" -o pid,ppid,user,lstart,command 2>/dev/null || true
    echo "=== children of $$ ==="
    # macOS: pgrep -P lists immediate children (e.g. cat for FD3 snapshot)
    _ch=
    _ch=$(pgrep -P "$$" 2>/dev/null || true)
    if [[ -n "${_ch}" ]]; then
      # shellcheck disable=SC2086
      ps -p ${_ch} -o pid,ppid,user,lstart,command 2>/dev/null || true
    else
      echo "(no children)"
    fi
  } >"$_out"
}

# sample_pid <pid> <seconds> <basename> — non-fatal if sample fails (wrong permissions, etc.)
sample_pid() {
  local _pid="$1" _secs="$2" _base="$3"
  [[ -z "${_pid}" ]] && return 0
  sample "${_pid}" "${_secs}" -file "${PROBE_DIR}/${_base}.txt" 2>"${PROBE_DIR}/${_base}.err" || true
}

echo "cursor-agent-exec-sandbox-probe START $(date -Iseconds) pid=$$"
echo "PROBE artifacts: ${PROBE_DIR}"

probe_ps_tree "${PROBE_DIR}/ps-tree-phase0.txt"

# PHASE 0: immediate marker — if this never prints, wrapper never reached user code.
echo "PHASE 0 echo ok"

# PHASE 1: long foreground sleep — timed sample(1) of $$ (+ wrapper $PPID + children) for stack paths.
# Note: $$ is the shell running THIS script (often bash); Cursor's snap=$(cat <&3) wrapper is $PPID (zsh).
echo "PHASE 1 foreground sleep 45s (auto sample → ${PROBE_DIR}) ..."
_sampler_pids=()
# Self (script interpreter): expect sleep/wait4, not stuck in read.
sample_pid "$$" 3 "sample-self-phase1-t0" &
_sampler_pids+=($!)
( sleep 15; sample_pid "$$" 3 "sample-self-phase1-t15" ) &
_sampler_pids+=($!)
( sleep 30; sample_pid "$$" 3 "sample-self-phase1-t30" ) &
_sampler_pids+=($!)
# Wrapper zsh (Cursor agent-exec): compare to hang stacks (readoutput/read vs waitjobs/sigsuspend).
sample_pid "$PPID" 3 "sample-wrapper-phase1-t0" &
_sampler_pids+=($!)
( sleep 15; sample_pid "$PPID" 3 "sample-wrapper-phase1-t15" ) &
_sampler_pids+=($!)
( sleep 30; sample_pid "$PPID" 3 "sample-wrapper-phase1-t30" ) &
_sampler_pids+=($!)
# Grandparent (often Cursor Helper (Plugin)) — one mid sample.
( sleep 8; sample_pid "$(ps -o ppid= -p "$PPID" 2>/dev/null | tr -d ' ')" 2 "sample-grandparent-phase1-t8" ) &
_sampler_pids+=($!)
# Any child processes (e.g. cat) while snapshot stream is active — hang case stays in read.
( sleep 5
  for _c in $(pgrep -P "$$" 2>/dev/null || true); do
    probe_ps_tree "${PROBE_DIR}/ps-tree-phase1-child-${_c}.txt"
    sample_pid "${_c}" 2 "sample-child-${_c}-phase1-t5"
  done
) &
_sampler_pids+=($!)

sleep 45
for _p in "${_sampler_pids[@]}"; do
  wait "${_p}" 2>/dev/null || true
done
probe_ps_tree "${PROBE_DIR}/ps-tree-phase1-end.txt"
echo "PHASE 1 done $(date -Iseconds)"

# PHASE 2: single background job + wait — job-control background inside zsh (not Cursor UI background).
echo "PHASE 2 background sleep 60s + wait ..."
sleep 60 &
_bg_pid=$!
echo "PHASE 2 background pid=${_bg_pid}"
wait "${_bg_pid}"
echo "PHASE 2 done $(date -Iseconds)"

# PHASE 3: two concurrent sleeps — more fork churn.
echo "PHASE 3 two subshell backgrounds ..."
( sleep 25; echo "PHASE 3a done" ) &
( sleep 25; echo "PHASE 3b done" ) &
wait
echo "PHASE 3 done $(date -Iseconds)"

echo "cursor-agent-exec-sandbox-probe END $(date -Iseconds) — if you see this, snapshot + user script completed."

Embedded: scripts/cursor-agent-exec-hang-live-diagnostics.sh

#!/usr/bin/env bash
# Live diagnostics while a Cursor agent-exec FD3 hang is in progress.
#
# CRITICAL: Run this from Terminal.app or an external shell — NOT from the hung Agent terminal.
#   The hung session never reaches your command; this script must use a different TTY.
#
# What it does
#   • Resolves wrapper zsh (and child cat, parent extension-host) or uses ZSH_PID.
#   • Repeated "sample passes" (default: 3 passes, 15s apart) — see if stacks drift
#     (e.g. EH moves into write vs stays in kevent; zsh stays readoutput/read).
#   • lsof socket correlation once per run (and optional per pass with LSOF_EACH_PASS=1).
#   • Lightweight ps RSS each pass (memory creep).
#
# Usage
#   ZSH_PID=97092 bash scripts/cursor-agent-exec-hang-live-diagnostics.sh
#   PASSES=5 INTERVAL=30 SAMPLE_SECS=3 bash scripts/cursor-agent-exec-hang-live-diagnostics.sh
#
# Local repro (no Cursor): start scripts/standalone-snapshot-fd3-blocker.sh, then set
#   ZSH_PID to the printed BLOCKER_PID (name is historical: blocking shell, often zsh in Cursor).
#
# Optional (manual, sudo) — macOS quirks
#   • dtruss: On many macOS versions (especially with SIP), dtruss fails entirely with
#     "dtrace: invalid probe specifier" and dumps its D script — even with no -t flags. If so,
#     syscall tracing via dtruss is unavailable; do not spend more time on it; use fs_usage below.
#   • fs_usage (usually works): PID is positional — NO -p. -t <seconds> is run duration.
#       sudo fs_usage -w -f filesys -t 5 <EH_PID> 2>&1 | tee /tmp/fsu.txt
#     "F=N" is the fd number (decimal). Cross-check with: lsof -n -p <EH_PID> | awk '$4 ~ /^[0-9]+u?$/'
#     pwrite/pread on an fd that lsof shows as the unix pair to zsh fd3 is relevant; other fds are
#     often DB/logs (e.g. state.vscdb). Thread suffix ".NNNNN" is normal.
#   • GNU coreutils "timeout" is often missing on Mac; use: perl -e 'alarm 5; exec @ARGV' cmd ...
#
# shellcheck disable=SC2009,SC2086
set -euo pipefail

ZSH_PID="${ZSH_PID:-}"
PASSES="${PASSES:-3}"
INTERVAL="${INTERVAL:-15}"
SAMPLE_SECS="${SAMPLE_SECS:-2}"
LSOF_EACH_PASS="${LSOF_EACH_PASS:-0}"
OUT="${OUT:-}"

REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
if [[ -z "${OUT}" ]]; then
  mkdir -p "${REPO_ROOT}/.zqk/logs/cursor-agent-exec-compare"
  OUT="${REPO_ROOT}/.zqk/logs/cursor-agent-exec-compare/hang-live-$(date +%Y%m%dT%H%M%S)"
fi
mkdir -p "${OUT}"

if [[ -z "${ZSH_PID}" ]]; then
  ZSH_PID=$(ps -ax -o pid=,args= 2>/dev/null | grep 'snap=$(command cat <&3)' | grep -v grep | awk '{print $1}' | head -1 || true)
fi
if [[ -z "${ZSH_PID}" ]]; then
  echo "No snap=(command cat <&3) zsh found. Set ZSH_PID= manually." >&2
  echo "  For a deliberate local blocker: bash scripts/standalone-snapshot-fd3-blocker.sh" >&2
  exit 1
fi

EH_PID=$(ps -p "${ZSH_PID}" -o ppid= 2>/dev/null | tr -d ' ')
# launchd(1) / kernel idle as parent is not a useful "extension host" peer for lsof.
if [[ "${EH_PID}" == "0" || "${EH_PID}" == "1" ]]; then
  EH_PID=""
fi

CAT_PID=""
for _c in $(pgrep -P "${ZSH_PID}" 2>/dev/null || true); do
  if [[ "$(ps -p "${_c}" -o comm= 2>/dev/null | tr -d ' ')" == "cat" ]]; then
    CAT_PID="${_c}"
    break
  fi
  for _g in $(pgrep -P "${_c}" 2>/dev/null || true); do
    if [[ "$(ps -p "${_g}" -o comm= 2>/dev/null | tr -d ' ')" == "cat" ]]; then
      CAT_PID="${_g}"
      break 2
    fi
  done
done

SUMMARY="${OUT}/summary.txt"
{
  echo "hang-live-diagnostics $(date -Iseconds)"
  echo "OUT=${OUT}"
  echo "ZSH_PID=${ZSH_PID} CAT_PID=${CAT_PID:-?} EH_PID=${EH_PID:-?}"
} | tee "${SUMMARY}"

extract_top_stack() {
  local _f="$1"
  echo "--- ${_f} ---"
  grep -E '^Process:|^Parent Process:' "${_f}" 2>/dev/null || true
  grep -A 12 'Sort by top of stack' "${_f}" 2>/dev/null | head -14
  echo ""
}

snapshot_lsof() {
  local _tag="$1"
  local _f="${OUT}/lsof-${_tag}.txt"
  {
    echo "=== lsof ZSH ${ZSH_PID} (fd 3) ==="
    lsof -n -p "${ZSH_PID}" 2>/dev/null | awk '$4 ~ /^3u?$/ {print}' || true
    if [[ -n "${CAT_PID}" ]]; then
      echo "=== lsof CAT ${CAT_PID} (0,3) ==="
      lsof -n -p "${CAT_PID}" 2>/dev/null | awk '$4 ~ /^(0u|3u)$/ {print}' || true
    fi
    if [[ -n "${EH_PID}" ]]; then
      echo "=== lsof EH ${EH_PID} (unix to same peer as zsh fd3 if any) ==="
      ZNODE=$(lsof -n -p "${ZSH_PID}" 2>/dev/null | awk '$4=="3u" && $5=="unix" {print $6; exit}')
      lsof -n -p "${EH_PID}" 2>/dev/null | grep unix | head -40 || true
      if [[ -n "${ZNODE}" ]]; then
        echo "=== system-wide grep NODE ${ZNODE} ==="
        lsof -n 2>/dev/null | grep -F "${ZNODE}" || true
      fi
    fi
  } | tee "${_f}"
}

snapshot_ps_rss() {
  local _tag="$1"
  local _f="${OUT}/ps-${_tag}.txt"
  {
    echo "=== ps rss/vsz ${_tag} ==="
    for _p in "${ZSH_PID}" "${CAT_PID}" "${EH_PID}"; do
      [[ -z "${_p}" ]] && continue
      ps -p "${_p}" -o pid,rss,vsz,etime,comm 2>/dev/null || echo "pid ${_p} gone"
    done
  } | tee "${_f}"
}

grep_sample_sig() {
  local _f="$1"
  grep -E 'readoutput|read\b|write\b|sendmsg|kevent|uv__io_poll|__sigsuspend|waitjobs' "${_f}" | head -15 || true
}

snapshot_lsof "start"
snapshot_ps_rss "start"

for ((pass = 1; pass <= PASSES; pass++)); do
  echo "" | tee -a "${SUMMARY}"
  echo "========== PASS ${pass}/${PASSES} $(date -Iseconds) ==========" | tee -a "${SUMMARY}"
  snapshot_ps_rss "pass${pass}"

  for pid_label in "zsh:${ZSH_PID}" "cat:${CAT_PID}" "eh:${EH_PID}"; do
    IFS=: read -r label pid <<<"${pid_label}"
    [[ -z "${pid}" ]] && continue
    if ! kill -0 "${pid}" 2>/dev/null; then
      echo "skip ${label} pid ${pid} (exited)" | tee -a "${SUMMARY}"
      continue
    fi
    SF="${OUT}/pass${pass}-${label}-${pid}.txt"
    sample "${pid}" "${SAMPLE_SECS}" -file "${SF}" 2>"${OUT}/pass${pass}-${label}-${pid}.err" || true
    {
      echo ""
      echo "### pass${pass} ${label} ${pid}"
      extract_top_stack "${SF}"
      echo "signals:"
      grep_sample_sig "${SF}"
    } >>"${SUMMARY}"
  done

  if [[ "${LSOF_EACH_PASS}" == "1" ]]; then
    snapshot_lsof "pass${pass}"
  fi

  if [[ "${pass}" -lt "${PASSES}" ]]; then
    sleep "${INTERVAL}"
  fi
done

echo "" | tee -a "${SUMMARY}"
echo "Done. Full samples under ${OUT}; read summary.txt for collapsed tops + greps." | tee -a "${SUMMARY}"

Embedded: scripts/standalone-snapshot-fd3-analog.sh

#!/usr/bin/env bash
# Standalone analog of the Cursor agent-exec hang class:
#   snap=$(command cat <&3)
# with a peer fd whose write side never closes, so cat blocks in read(2) and the
# shell blocks in readoutput waiting for cat's stdout — same OS pattern, no Cursor.
#
# Quick rule-out: if a snapshot stream never terminates (no EOF on the snap fd), this outcome
# is likely. It is not Cursor-specific — ruling it in/out is fast (stacks + lsof). A fix on the
# producer side (close/finish the snapshot) applies when this matches. See also:
#   scripts/standalone-snapshot-fd3-blocker.sh — long-running background blocker for sample(1) drills.
#
# Two demos:
#   1) mkfifo + background writer (same idea as fd3-snapshot-hang-mechanism-demo.sh)
#   2) bash: exec 3< <(sleep …) then command substitution with cat <&3
#
# Usage: bash scripts/standalone-snapshot-fd3-analog.sh
# Optional: SAMPLE_PID=<pid> bash …  — prints one sample(1) line for that PID while demo runs

set -euo pipefail

demo_fifo() {
	echo ""
	echo "=== Demo A: mkfifo + writer that never closes (until we kill it) ==="
	local FIFO
	FIFO=$(mktemp -u "${TMPDIR:-/tmp}/snap-fifo.XXXXXX")
	mkfifo "$FIFO"
	cleanup_fifo() { rm -f "$FIFO"; }
	trap cleanup_fifo EXIT

	sleep 99999 >"$FIFO" &
	local _w=$!
	echo "writer PID=${_w} (holds FIFO write open — reader sees no EOF)"
	sleep 0.2

	echo "running: cat \"$FIFO\" with 2s limit (python) — expect TimeoutExpired"
	python3 - "$FIFO" <<'PY'
import subprocess, sys
fifo = sys.argv[1]
try:
	subprocess.run(["cat", fifo], timeout=2, check=False)
except subprocess.TimeoutExpired:
	print("  -> blocked on read until timeout (expected)")
PY
	kill "${_w}" 2>/dev/null || true
	wait "${_w}" 2>/dev/null || true
	trap - EXIT
	rm -f "$FIFO"
	echo "Demo A done."
}

demo_exec3_procsub() {
	echo ""
	echo "=== Demo B: exec 3< <(long sleep) — mirrors cat reading fd 3 with no EOF ==="
	# Inner script: open read end on 3; snap=$(command cat <&3) blocks like Cursor zsh.
	python3 <<'PY'
import subprocess, textwrap
script = textwrap.dedent(r'''
	set -u
	# Write end held by subshell running sleep; no bytes, no EOF until sleep exits.
	exec 3< <(sleep 86400)
	echo "inner: fd 3 open; running snap=$(command cat <&3) (blocks)"
	snap=$(command cat <&3)
	echo "inner: unreachable snap=${snap}"
''')
try:
	subprocess.run(
		["bash", "-c", script],
		timeout=3,
		check=False,
		capture_output=True,
		text=True,
	)
except subprocess.TimeoutExpired:
	print("  -> inner bash timed out at 3s: snap=$(command cat <&3) did not finish (expected)")
PY
	echo "Demo B done."
}

demo_fifo
demo_exec3_procsub

if [[ -n "${SAMPLE_PID:-}" ]]; then
	echo ""
	echo "=== sample(1) hint (optional): while a blocking demo runs, in another terminal: ==="
	echo "  sample ${SAMPLE_PID} 1 | head -5"
fi

echo ""
echo "All standalone demos finished. This reproduces the blocking-read class only; it does not invoke Cursor."

Embedded: scripts/standalone-snapshot-fd3-blocker.sh

#!/usr/bin/env bash
# Long-running local repro of the "no EOF on the snap fd" hang class (not Cursor code).
#
# Why this exists
#   If the peer write side of fd 3 never closes, `snap=$(command cat <&3)` never finishes:
#   cat blocks in read(2); the parent shell blocks in readoutput. That is normal OS behavior.
#   Ruling this in or out is fast: compare stacks (readoutput/read vs cat/read on fd 3) with
#   lsof on the unix pair — same pattern as a stuck stream. Fixing the producer to close or
#   finish the snapshot is then the right fix when this matches.
#
# This script starts that situation on purpose so you can practice `sample(1)` and
#   cursor-agent-exec-hang-live-diagnostics.sh without waiting for Cursor.
#
# Usage
#   bash scripts/standalone-snapshot-fd3-blocker.sh
#   WALL_SECS=120 bash scripts/standalone-snapshot-fd3-blocker.sh   # shorter sleep on write end
#
# Then in another terminal (not the one you might block):
#   sample <BLOCKER_PID> 2 -file /tmp/standalone-sample.txt
#   ZSH_PID=<BLOCKER_PID> bash scripts/cursor-agent-exec-hang-live-diagnostics.sh
#
# Stop the blocker:
#   kill <BLOCKER_PID>        # or: kill $(cat <state-file below>)
#
# shellcheck disable=SC2009
set -euo pipefail

WALL_SECS="${WALL_SECS:-600}"
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
STATE_DIR="${REPO_ROOT}/.zqk/logs/cursor-agent-exec-compare"
mkdir -p "${STATE_DIR}"
STATE_FILE="${STATE_DIR}/standalone-fd3-blocker.pid"

# Write end stays open until sleep exits; read on fd 3 has no EOF until then.
# Outer bash runs: snap=$(command cat <&3) — same shape as the agent-exec wrapper.
bash -c '
exec 3< <(sleep '"${WALL_SECS}"')
echo "standalone-fd3-blocker: blocking until snap fd EOF (bash pid $$)" >&2
snap=$(command cat <&3)
echo "${snap}"
' &
BLOCKER_PID=$!

echo "BLOCKER_PID=${BLOCKER_PID}"
echo "${BLOCKER_PID}" >"${STATE_FILE}"
echo "STATE_FILE=${STATE_FILE}"
export SAMPLE_PID="${BLOCKER_PID}"
echo "SAMPLE_PID=${SAMPLE_PID}  (export this in another shell if you use sample(1) by hand)"

sleep 1.0
CAT_PID=""
if command -v pgrep >/dev/null 2>&1; then
	# cat may be a grandchild (bash → bash → cat); scan descendants shallowly
	for _c in $(pgrep -P "${BLOCKER_PID}" 2>/dev/null || true); do
		if [[ "$(ps -p "${_c}" -o comm= 2>/dev/null | tr -d ' ')" == "cat" ]]; then
			CAT_PID="${_c}"
			break
		fi
		for _g in $(pgrep -P "${_c}" 2>/dev/null || true); do
			if [[ "$(ps -p "${_g}" -o comm= 2>/dev/null | tr -d ' ')" == "cat" ]]; then
				CAT_PID="${_g}"
				break 2
			fi
		done
	done
fi
echo "CAT_PID=${CAT_PID:-unknown-yet}"

echo ""
echo "=== Run live diagnostics (uses ZSH_PID as the blocking shell; name is historical) ==="
echo "  ZSH_PID=${BLOCKER_PID} bash ${REPO_ROOT}/scripts/cursor-agent-exec-hang-live-diagnostics.sh"
echo ""
echo "=== Or sample the blocker only ==="
echo "  sample ${BLOCKER_PID} 2 -file ${STATE_DIR}/standalone-sample-blocker.txt"
echo ""
echo "=== When done ==="
echo "  kill ${BLOCKER_PID}"
echo ""
echo "Blocker running until sleep (${WALL_SECS}s) ends or process is killed."

Embedded: scripts/fd3-snapshot-hang-mechanism-demo.sh

#!/usr/bin/env bash
# Mechanical analog of the Cursor agent-exec hang class: a reader blocks in read(2)
# until the write side of a pipe/FIFO closes (EOF). If the writer never closes, read never returns.
#
# Cursor’s case: zsh runs snap=$(command cat <&3); extension host must close the write side of FD 3.
# This script does not involve Cursor — it only demonstrates the OS pattern.
#
# Usage: bash scripts/fd3-snapshot-hang-mechanism-demo.sh
#
# See also: scripts/standalone-snapshot-fd3-analog.sh — adds exec 3< <(sleep …) +
#   snap=$(command cat <&3) for a closer match to the zsh wrapper.

set -euo pipefail
FIFO=$(mktemp -u "${TMPDIR:-/tmp}/fd3-demo.XXXXXX")
mkfifo "$FIFO"
cleanup() { rm -f "$FIFO"; }
trap cleanup EXIT

sleep 99999 >"$FIFO" &
_writer=$!
echo "writer PID=${_writer} (holds FIFO write — no EOF yet)"
sleep 0.2

echo "running: cat on FIFO with 2s wall-clock limit (python) — same blocking read class as cat <&3"
python3 - "$FIFO" <<'PY'
import subprocess, sys
fifo = sys.argv[1]
try:
    subprocess.run(["cat", fifo], timeout=2, check=False)
except subprocess.TimeoutExpired:
    print("reader: timed out after 2s (expected — blocked on read)")
PY

kill "${_writer}" 2>/dev/null || true
wait "${_writer}" 2>/dev/null || true
echo "done."