Hook stdin pipe double-encodes non-ASCII on non-UTF-8 Windows

Where does the bug appear (feature/product)?

Cursor IDE

Describe the Bug

On Windows systems with a non-UTF-8 default code page (e.g., Chinese GBK/936), the JSON payload piped to hook stdin contains double-encoded non-ASCII characters.

Chinese text in the “prompt” field is garbled. For example, “测试一下” (4 Chinese characters) becomes “娴嬭瘯涓€涓嬶紝” — the UTF-8 bytes were misinterpreted as GBK and then re-encoded as UTF-8.

ASCII parts of the JSON (keys, braces, English text) are unaffected. Only non-ASCII string content is corrupted.

Verified using kernel32!ReadFile P/Invoke to read raw pipe bytes — the double-encoding exists in the pipe BEFORE any PowerShell/.NET processing.

Steps to Reproduce

  1. Use a Windows system with system locale set to Chinese Simplified (code page 936/GBK). Do NOT enable “Use Unicode UTF-8 for worldwide language support”.
  2. Create a hooks.json with a beforeSubmitPrompt hook
  3. In the hook script, read stdin and hex-dump the raw bytes
  4. Send a prompt containing Chinese characters (e.g., “测试中文”)
  5. Observe the hex dump: expected UTF-8 for “测” is E6-B5-8B, but actual pipe bytes show E5-A8-B4 (double-encoded)

Expected Behavior

The hook’s stdin pipe should contain valid UTF-8 JSON regardless of the Windows system code page. Non-ASCII characters in string fields (prompt, etc.) should be encoded as standard UTF-8.

Operating System

Windows 10/11

Version Information

IDE:
Version: 3.2.11
Commit: e9ee1339915a927dfb2df4a836dd9c8337e17cc0

For AI issues: which model did you use?

N/A

For AI issues: add Request ID with privacy disabled

N/A

Additional Information

Affects all Cursor users on non-UTF-8 Windows (Chinese/Japanese/Korean locale).

Root cause: When spawning hook child processes, the stdin pipe write likely uses the system default code page instead of UTF-8 for encoding the JSON string.

Suggested fix: Ensure child.stdin.write(json, ‘utf8’) or equivalent explicit UTF-8 encoding in the hook execution code path.

Workaround: We reverse the double-encoding in the hook script (decode UTF-8 → encode GBK → decode UTF-8), which recovers ~95% of text but is lossy for certain byte alignments (e.g., fullwidth colon + ASCII slash at specific positions).

This encoding issue also affects:

  1. Shell command stdout - when Agent runs commands like Get-Content
    on UTF-8 files, the terminal output is garbled (see screenshots)
  2. This happens with all models (Claude, GPT) and both IDE Agent
    and Codex/Background Agent
  3. Root cause is the same: child process I/O uses system code page
    (GBK 936) instead of UTF-8 on non-UTF-8 Windows systems

Does this stop you from using Cursor

Sometimes - I can sometimes use Cursor

Thanks for the detailed bug report! Really appreciate the root cause analysis, that’s super helpful.

We’ve filed a bug internally to track this. Thanks for sharing the workaround too. I understand it’s lossy and not ideal, but glad it gets most of the way there in the meantime!

Another option that should fully avoid the issue: if you install PowerShell 7 and make pwsh available on your PATH, Cursor will use it instead of Windows PowerShell 5.1. PowerShell 7 defaults to UTF-8 for all I/O, so the double-encoding shouldn’t happen.

It may not be related to PowerShell. I’m getting garbled Chinese even when running a Node script through hooks. I’m close to losing it—this is already affecting the system I’m developing. Could you let me know when this will be fixed? It worked before, but now everything is garbled.

windows cursor hook jsonl

{“conversation_id”:“e5ff1b10-f6af-4dde-bf43-bf4756a7a706”,“generation_id”:“5b744bd8-0a9e-4c72-a9dc-6eea03b2f9fa”,“model”:“default”,“composer_mode”:“agent”,“prompt”:“澶╂皵涔熶笉濂?77鍟奵ss”,“attachments”:[],“session_id”:“e5ff1b10-f6af-4dde-bf43-bf4756a7a706”,“hook_event_name”:“beforeSubmitPrompt”,“cursor_version”:“3.2.16”,“workspace_roots”:[“/E:/project/s鏃犳儏3/csdf”,“/C:/Users/yi.chen/.cursor/link-works-tracking”],“user_email”:“[email protected]”,“transcript_path”:“c:\\Users\\yi.chen\\.cursor\\projects\\e-project-s-3-csdf\\agent-transcripts\\e5ff1b10-f6af-4dde-bf43-bf4756a7a706\\e5ff1b10-f6af-4dde-bf43-bf4756a7a706.jsonl”}

控制面板 → 区域 → 管理 → 更改系统区域设置,勾选「Beta 版:使用 Unicode UTF-8 提供全球语言支持」,即可解决乱码问题。

补充精简版(适合写文档 / 备注)

控制面板 > 区域 > 管理 > 区域设置,勾选 Beta 版 UTF-8 选项,乱码问题修复。

路径对应英文

Control Panel → Region → Administrative → Change system locale

勾选:Beta version: Use Unicode UTF-8 for worldwide language support

This is the same bug @Colin mentioned above. Even though your hook runs a Node script, Cursor’s hook pipeline on Windows still passes the payload through PowerShell before piping it to your command.
Two workaround options:

  1. System locale change (what you found in post 11): Control Panel > Region > Administrative > Change system locale > check “Beta: Use Unicode UTF-8 for worldwide language support.” This works but affects all applications system-wide.

  2. Install PowerShell 7: Put pwsh on your PATH. Cursor will use it instead of PowerShell 5.1, and PS7 defaults to UTF-8 for all I/O.

We don’t have a timeline to share for the fix, but it’s tracked on our end. Sorry for the disruption to your workflow.

After testing, Solution 2 did not fix the garbled text issue. I used Solution 1 instead.