Japanese Characters Occasionally Corrupted When Using "Apply" Button

nikechan · August 21, 2024, 1:02pm

I’ve encountered an issue when using the “Apply” button to display code change predictions. Occasionally, instead of showing the expected changes, the prediction incorrectly modifies unrelated parts of the code. Specifically, it sometimes converts portions of Japanese string literals into “��” characters.

This bug appears to be intermittent and doesn’t always occur. However, when it does happen, it affects areas of the code that should not be modified at all, rather than the sections where changes are expected.

Steps to reproduce:

Open a file containing Japanese string literals in Cursor.
Use AI chat or composer.
Click the “Apply” button to generate change predictions.
Observe that sometimes, instead of showing relevant changes, parts of Japanese strings are corrupted into “��” characters in unrelated areas of the code.

Expected behavior: The “Apply” button should only show predictions for relevant code changes and should not modify or corrupt existing Japanese string literals.

Actual behavior: Occasionally, clicking “Apply” results in corrupted Japanese characters (��) in unrelated parts of the code.

Additional information:

This issue occurs intermittently and is not consistently reproducible.
It only affects Japanese string literals, not other parts of the code.

CleanShot 2024-08-21 at 15.05.17

fun_strange · August 21, 2024, 2:00pm

I also have a similar issue with apostrophes, it’s replacing ’ type of apostrophe with a single quote when I hit ‘apply’, despite me not asking to edit that part of the code.

Tomoshi3104 · November 16, 2024, 6:01am

I have the same “��” issue with Version: 0.42.5.

muddlebee · December 2, 2024, 7:39am

similar issue reported Weird ascii characters in prompt output from new claude-3.5 sonnet model

@admins pls take a look

Zhenyi-Wang · December 27, 2024, 2:06am

Same here with Chinese Characters.

e.g. 日志管理 can be split into 日��管理

This issue might be related to the tokenizer splitting UTF-8 characters into multiple tokens, which are then streamed as separate messages and not properly reassembled by Cursor.

It has been a bit of a hassle, as I’ve had to manually fix the corrupted characters every time . It would be amazing if the team could prioritize this fix—it would save a lot of time and frustration!

Version: 0.44.8
VSCode Version: 1.93.1
Commit: f3b5a63019e4e2283033b4db987a35f8413c7570
Electron: 2024-12-22T05:48:08.427Z
ElectronBuildId: 30.5.1
Chromium: undefined
Node.js: 124.0.6367.243
V8: 20.16.0
OS: 12.4.254.20-electron.0

brandonwie · December 27, 2024, 6:09am

Same here with Korean letters.
Corrupted Korean values are suggested even if there’s no issue.

Version: 0.44.8
VSCode Version: 1.93.1
Commit: f3b5a63019e4e2283033b4db987a35f8413c7570
Date: 2024-12-22T05:48:08.427Z
Electron: 30.5.1
Chromium: 124.0.6367.243
Node.js: 20.16.0
V8: 12.4.254.20-electron.0
OS: Darwin arm64 24.2.0

Zhenyi-Wang · January 8, 2025, 1:57am

Looks like it’s been fixed, hasn’t popped up again for a while.

Topic		Replies	Views
Chinese characters display as "��" - Text becomes unreadable Bug Reports	6	274	December 28, 2024
Cursor's apply model unexpectedly converts typographic quotes and apostrophes on existing code Bug Reports	20	335	July 21, 2025
Apply stopped Applying Discussions	13	2069	December 9, 2024
Cursor 0.42.4生成的代码出现代码字符的问题 Bug Reports	9	978	December 25, 2024
Korean comments in code sometimes turn into special symbols when applied Bug Reports	6	287	April 13, 2025

Japanese Characters Occasionally Corrupted When Using "Apply" Button

Related topics