Windows: Cursor Hooks stdin corrupts UTF-8 text before hook script receives it

Where does the bug appear (feature/product)?

Cursor IDE

Describe the Bug

On Windows, Cursor Hooks receive corrupted Korean UTF-8 text through stdin.

The corruption happens before my hook script sends anything to my backend. I verified this by
logging the raw stdin bytes, UTF-8-decoded stdin string, and parsed JSON inside the hook script.

Example Korean prompt entered in Cursor:

다시한번 백엔드 로그를 찾아보자. 안녕.

Expected hook payload:

{
“prompt”: “다시한번 백엔드 로그를 찾아보자. 안녕.”
}

Actual hook stdin payload already contains corrupted text:

?ㅼ떆?쒕쾲 諛깆뿏??濡쒓렇瑜?李띿뼱蹂댁옄. ?덈뀞.

The backend is not the cause. The hook script receives the corrupted value before JSON.parse() and before making any HTTP request.

Additional evidence: the transcript_path JSONL file contains the correct Korean text, but the hook
stdin payload contains corrupted Korean text.

Steps to Reproduce

  1. Configure a Cursor Hook for beforeSubmitPrompt.
  2. Use a hook script that reads raw stdin bytes and logs:
    • stdin byte length
    • stdin hex prefix
    • Buffer.concat(chunks).toString(‘utf8’)
    • parsed payload.prompt
  3. Submit a Korean prompt in Cursor:

다시한번 백엔드 로그를 찾아보자. 안녕.

  1. Check the hook script logs.

Minimal debug script:

const buffers = ;

process.stdin.on(‘data’, chunk => {
buffers.push(Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk));
});

process.stdin.on(‘end’, () => {
const fullBuffer = Buffer.concat(buffers);
console.log(‘stdin byteLength:’, fullBuffer.length);
console.log(‘stdin hex prefix:’, fullBuffer.subarray(0, 256).toString(‘hex’));
console.log(‘stdin utf8:’, fullBuffer.toString(‘utf8’));

const payload = JSON.parse(
  fullBuffer.toString('utf8').trim().replace(/^\uFEFF/, '')
);

console.log('parsed prompt:', payload.prompt);

});

Observed log:

stdin utf8:
{“prompt”:“?ㅼ떆?쒕쾲 諛깆뿏??濡쒓렇瑜?李띿뼱蹂댁옄. ?덈뀞.”, …}

parsed prompt:
?ㅼ떆?쒕쾲 諛깆뿏??濡쒓렇瑜?李띿뼱蹂댁옄. ?덈뀞.

The raw hex also shows that the corrupted string is already encoded in stdin. For example, the
prompt value contains bytes like:

3f e3 85 bc …

This means the hook receives ?ㅼ… as UTF-8 text, not the original Korean UTF-8 bytes.

Operating System

Windows 10/11

Version Information

Version: 3.6.31

Does this stop you from using Cursor

No - Cursor works, but with this issue

Hey, awesome report. The hex proof and the minimal repro help a lot.

This is a known bug. On Windows with a non-UTF-8 system code page, in your case Korean CP949, the payload read on the Cursor side via PowerShell gets interpreted as ANSI instead of UTF-8. That’s why you see mojibake in stdin. The temp file itself is written as valid UTF-8, it breaks specifically at read time. This is the same class of issue already reported for Chinese and Cyrillic:

We’re tracking it, but I can’t share an exact fix timeline yet. I’ll add your case to the existing report. It confirms the issue is still present in 3.6.31.

As a workaround, you can try enabling system-wide UTF-8 in Windows: Region settings → Administrative language settings → Beta: Use Unicode UTF-8 for worldwide language support → reboot. That switches the system code page to UTF-8 65001, and reading should work correctly. This is a system setting, so keep in mind it affects your whole environment, not just Cursor. Let me know if it helps.

I’ll mark this thread as a duplicate of Hook stdin pipe double-encodes non-ASCII on non-UTF-8 Windows so we keep everything in one place.

Thank you for the detailed explanation and for confirming the bug.

Regarding the suggested workaround, changing the system-wide UTF-8 setting requires approval from our security officer due to corporate policy, making it difficult to apply right away.

As an alternative temporary workaround, I have modified my hook script to read the data directly from the transcript_path file instead of relying on stdin.

Could you please confirm if proceeding with this transcript-reading approach as a workaround is safe and acceptable for the time being?

Thanks!

Yeah, reading transcript_path directly is a totally workable and safe workaround. The file itself is written as valid UTF-8. The issue is only on the PowerShell stdin reading side, so if your script opens the transcript file itself (in Node with utf8), it fully bypasses the layer that corrupts the text.

A couple notes until the bug is fixed:

  • For beforeSubmitPrompt, make sure the current prompt has already been written to the transcript by the time the hook runs (you’ve confirmed this). If you change the logic later, recheck that you’re reading the last entry.
  • The JSONL format in the transcript isn’t a stable public contract, so in theory it could change between versions. It’s fine as a temporary workaround, but I wouldn’t build anything long term that depends on it.
  • Only non-ASCII text fields (like prompt) are affected. ASCII fields (paths, IDs) come through stdin correctly.

There’s also another option that might be easier for your case since you can’t enable system-wide UTF-8: install PowerShell 7 and add pwsh to PATH. Cursor will pick it up instead of Windows PowerShell 5.1, and PS7 uses UTF-8 by default for all I/O, so the stdin double-encoding shouldn’t happen. Important: this is a per-user install and doesn’t touch system settings, so it shouldn’t need security officer approval. One note though, it didn’t help for one user in a similar case, so no guarantees, but it’s worth trying. If it works, you can stay on stdin without depending on the transcript format.

The bug is being tracked and your case is attached. I can’t share an ETA yet. Once there’s an update, I’ll reply in the thread. Let me know if the pwsh option works, or if the transcript approach hits any issues.