I’ve encountered an issue when using the “Apply” button to display code change predictions. Occasionally, instead of showing the expected changes, the prediction incorrectly modifies unrelated parts of the code. Specifically, it sometimes converts portions of Japanese string literals into “��” characters.
This bug appears to be intermittent and doesn’t always occur. However, when it does happen, it affects areas of the code that should not be modified at all, rather than the sections where changes are expected.
Steps to reproduce:
Open a file containing Japanese string literals in Cursor.
Use AI chat or composer.
Click the “Apply” button to generate change predictions.
Observe that sometimes, instead of showing relevant changes, parts of Japanese strings are corrupted into “��” characters in unrelated areas of the code.
Expected behavior: The “Apply” button should only show predictions for relevant code changes and should not modify or corrupt existing Japanese string literals.
Actual behavior: Occasionally, clicking “Apply” results in corrupted Japanese characters (��) in unrelated parts of the code.
Additional information:
This issue occurs intermittently and is not consistently reproducible.
It only affects Japanese string literals, not other parts of the code.
I also have a similar issue with apostrophes, it’s replacing ’ type of apostrophe with a single quote when I hit ‘apply’, despite me not asking to edit that part of the code.
This issue might be related to the tokenizer splitting UTF-8 characters into multiple tokens, which are then streamed as separate messages and not properly reassembled by Cursor.
It has been a bit of a hassle, as I’ve had to manually fix the corrupted characters every time . It would be amazing if the team could prioritize this fix—it would save a lot of time and frustration!
Same exact issue as well, it is so wierd to see Chinese, Korean and Japanese are agreeing the same thing at the same time.
Many Chinese users reported this issue 5 months ago, no reply, no solution, the problem is caused by Cursor trying to match the generated result and swap the diff, some experienced coder said that Cursor trained a small language to do that? a unique text encoder? this is the culprit that caused all the issue! Please devs! Make this issue priority!!!