Where does the bug appear (feature/product)?
Cursor IDE
Describe the Bug
On Windows systems with a non-UTF-8 default code page (e.g., Chinese GBK/936), the JSON payload piped to hook stdin contains double-encoded non-ASCII characters.
Chinese text in the “prompt” field is garbled. For example, “测试一下” (4 Chinese characters) becomes “娴嬭瘯涓€涓嬶紝” — the UTF-8 bytes were misinterpreted as GBK and then re-encoded as UTF-8.
ASCII parts of the JSON (keys, braces, English text) are unaffected. Only non-ASCII string content is corrupted.
Verified using kernel32!ReadFile P/Invoke to read raw pipe bytes — the double-encoding exists in the pipe BEFORE any PowerShell/.NET processing.
Steps to Reproduce
- Use a Windows system with system locale set to Chinese Simplified (code page 936/GBK). Do NOT enable “Use Unicode UTF-8 for worldwide language support”.
- Create a hooks.json with a beforeSubmitPrompt hook
- In the hook script, read stdin and hex-dump the raw bytes
- Send a prompt containing Chinese characters (e.g., “测试中文”)
- Observe the hex dump: expected UTF-8 for “测” is E6-B5-8B, but actual pipe bytes show E5-A8-B4 (double-encoded)
Expected Behavior
The hook’s stdin pipe should contain valid UTF-8 JSON regardless of the Windows system code page. Non-ASCII characters in string fields (prompt, etc.) should be encoded as standard UTF-8.
Operating System
Windows 10/11
Version Information
IDE:
Version: 3.2.11
Commit: e9ee1339915a927dfb2df4a836dd9c8337e17cc0
For AI issues: which model did you use?
N/A
For AI issues: add Request ID with privacy disabled
N/A
Additional Information
Affects all Cursor users on non-UTF-8 Windows (Chinese/Japanese/Korean locale).
Root cause: When spawning hook child processes, the stdin pipe write likely uses the system default code page instead of UTF-8 for encoding the JSON string.
Suggested fix: Ensure child.stdin.write(json, ‘utf8’) or equivalent explicit UTF-8 encoding in the hook execution code path.
Workaround: We reverse the double-encoding in the hook script (decode UTF-8 → encode GBK → decode UTF-8), which recovers ~95% of text but is lossy for certain byte alignments (e.g., fullwidth colon + ASCII slash at specific positions).
This encoding issue also affects:
- Shell command stdout - when Agent runs commands like Get-Content
on UTF-8 files, the terminal output is garbled (see screenshots) - This happens with all models (Claude, GPT) and both IDE Agent
and Codex/Background Agent - Root cause is the same: child process I/O uses system code page
(GBK 936) instead of UTF-8 on non-UTF-8 Windows systems
Does this stop you from using Cursor
Sometimes - I can sometimes use Cursor