AI edits corrupt non-UTF-8 characters (€ symbol becomes ? in Windows-1252 files)

Where does the bug appear (feature/product)?
Cursor IDE

Describe the Bug
When Cursor AI applies edits to files encoded in Windows-1252, the € symbol (euro sign, byte 0x80) gets corrupted and replaced with “?”. This appears to be an architectural limitation where AI-generated text is always UTF-8 encoded before being inserted into files.

Steps to Reproduce

  1. Create a workspace with files.encoding set to windows1252 in settings
  2. Open a PHP file containing the € symbol (e.g., "Prezzo: € 1.000")
  3. Ask Cursor AI to make any modification to that file
  4. Accept/Apply the AI changes
  5. All € symbols in the modified sections become “?”

What I’ve already tried (nothing worked)

  • Setting files.encoding: windows1252 in .vscode/settings.json
  • Setting files.autoGuessEncoding: false
  • Creating a multi-root workspace (.code-workspace) with per-folder encoding settings
  • Adding .cursorrules instructing AI to preserve € symbols
  • Setting files.eol to CRLF

Expected Behavior
Cursor should detect the target file’s encoding and convert AI-generated text accordingly before applying edits, preserving special characters like €.

Operating System
Windows 11

Version Information
Version: 2.4.23 (system setup)
VSCode Version: 1.105.1
Commit: 379934e04d2b3290cf7aefa14560f942e4212920
Date: 2026-01-29T21:24:23.350Z
Build Type: Stable
Release Track: Default
Electron: 39.2.7
Chromium: 142.0.7444.235
Node.js: 22.21.1
V8: 14.2.231.21-electron.0
OS: Windows_NT x64 10.0.26200

For AI issues: which model did you use?
All models (tested with Opus 4.5, same issue)

Additional Information
This makes Cursor AI unusable for legacy projects that cannot be migrated to UTF-8. Many enterprise/legacy codebases in Europe use Windows-1252 encoding and contain currency symbols (€) throughout the codebase.

Is this an architectural limitation? Is there any plan to support encoding conversion for AI edits?

Does this stop you from using Cursor?
Yes - Cursor is unusable (for this specific legacy project)

Hi there!

We detected that this may be a bug report, so we’ve moved your post to the Bug Reports category.

To help us investigate and fix this faster, could you edit your original post to include the details from the template below?

Bug Report Template - Click to expand

Where does the bug appear (feature/product)?

  • Cursor IDE
  • Cursor CLI
  • Background Agent (GitHub, Slack, Web, Linear)
  • BugBot
  • Somewhere else…

Describe the Bug
A clear and concise description of what the bug is.


Steps to Reproduce
How can you reproduce this bug? We have a much better chance at fixing issues if we can reproduce them!


Expected Behavior
What is meant to happen here that isn’t working correctly?


Screenshots / Screen Recordings
If applicable, attach images or videos (.jpg, .png, .gif, .mp4, .mov)


Operating System

  • Windows 10/11
  • MacOS
  • Linux

Version Information

  • For Cursor IDE: Menu → About Cursor → Copy
  • For Cursor CLI: Run agent about in your terminal
IDE:
Version: 2.xx.x
VSCode Version: 1.105.1
Commit: ......

CLI:
CLI Version 2026.01.17-d239e66

For AI issues: which model did you use?
Model name (e.g., Sonnet 4, Tab…)


For AI issues: add Request ID with privacy disabled
Request ID: f9a7046a-279b-47e5-ab48-6e8dc12daba1
For Background Agent issues, also post the ID: bc-…


Additional Information
Add any other context about the problem here.


Does this stop you from using Cursor?

  • Yes - Cursor is unusable
  • Sometimes - I can sometimes use Cursor
  • No - Cursor works, but with this issue

The more details you provide, the easier it is for us to reproduce and fix the issue. Thanks!

1 Like

Hey, thanks for the detailed report. This is a known issue: the Agent in 2.4.x forces all files to be saved as UTF-8 and ignores the original encoding.

You’re not alone. There are already 8+ similar threads for Windows-1252, EUC-KR, and GB2312/GBK. The team is aware and the bug is logged.

Workarounds:

  1. Manually re-save after every AI edit:

    • After the Agent changes the file: bottom-right corner of the editor → click the encoding → “Reopen with Encoding” → Windows-1252
    • Then “Save with Encoding” → Windows-1252
    • Annoying, but it works
  2. Try CTRL+K (inline edit) instead of Agent. It might break encoding less often, but it’s not guaranteed

Related threads:

I see you tried .cursorrules, .code-workspace, and files.encoding. You’re right, this is an architectural issue in how Agent applies edits. Settings are ignored right now.

1 Like

Thanks for confirming — that matches what I’m seeing.

Could you share the public issue ID / tracker link for this bug, and whether there’s an ETA (or at least the target version) for a fix? This is a hard blocker for legacy Windows-1252 codebases (common in EU enterprise projects).

Manual “Reopen/Save with Encoding” after every AI edit is not a viable workaround in real workflows.

Also: is the fix planned specifically for Agent mode only, or for all AI-applied edits (including inline edits / Ctrl+K)?

I’m sorry, but this issue of reopening the file isn’t working. This problem is already quite annoying because it ends up altering important and functional parts of the code, as it might interpret a character like “-” within a .split as special and end up changing it as well.

1 Like

Thanks for the update. Yeah, you’re right. The “Reopen/Save with Encoding” workaround doesn’t help because the corruption happens on the server side before the code comes back to the editor. By the time you see the diffs, the characters have already been replaced with “?”.

What actually works right now (confirmed by other users):

  • Roll back to version 2.3.41 from Download · Cursor and turn off auto-update
  • A few users in similar threads confirmed the issue doesn’t happen on 2.3.x

I know it’s not ideal, but it’s the only option for legacy codebases until a fix is released.

1 Like

Thanks a lot for the thorough follow-up.

I’ll roll back to 2.3.41 and disable auto-update as suggested, since this is currently a hard blocker for Windows-1252 legacy codebases.

Really appreciate the help and the confirmations from the community. Cursor is genuinely very valuable in my workflow, so I’m looking forward to a proper fix in 2.4+ when it lands.

If there’s any public tracker/issue link or release note I should watch, please share it — I’ll happily retest as soon as a build includes the fix. Thanks again!

1 Like

I had the same problem. Solved adding a new user rule:

OverviewAlways preserve UTF-8 encoding and special characters (accents, ç, ñ, etc.) when modifying code. Never replace special characters with question marks or other symbols. All file edits must maintain the original character encoding and preserve all Catalan characters (à, è, é, í, ò, ó, ú, ç, ñ, etc.) exactly as they appear in the original code.

Hi!

I can confirm that for accents and many special characters, user rules help on my side too.
Unfortunately, for the € symbol (Windows-1252, byte 0x80) nothing has worked: as mentioned above, the corruption seems to happen server-side when applying edits (especially in Agent / 2.4.x). By the time the diff comes back to the editor, the character has already been replaced with “?”, so rules can’t really prevent it.

If you don’t mind, could you try a quick test on your side?
The exact repro steps are in my post above (a Windows-1252 file with something like Prezzo: € 1.000, then any AI edit and apply).
I’d love to know whether, with your user rule, the stays intact or still gets turned into “?”.

Thanks!

Hello,

Originally, my codes are UTF-8 encoded, so it is not exactly the same issue as you experienced. I tried with the € symbol and it works well.

The second test I made for you is to generate an ANSI-encoded file (I think it is the same as the mentioned Windows-1252) with accents and the ‘€’ symbol. In this case:

a) Before any agent-modification, the special characters are already misdisplayed in the cursor IDE
b) After a agent-modification, the file becomes UTF-8 codified, and ALL the special characters are lost.

a) and b) effects are with my user-rule mentioned active, so in this case seems the user rule is not helping.

I hope it’s helping

1 Like

Thanks a lot for testing this so thoroughly — super helpful.

That confirms what I’m seeing: the user rule may help when the file is already UTF-8, but it doesn’t solve the real blocker for ANSI / Windows-1252 files. In your ANSI test, Cursor already mis-displays special chars before any Agent edit, and after an Agent modification the file gets converted to UTF-8 and the special characters are lost anyway — even with the rule enabled.

So it looks like this isn’t something user rules can fix, and it’s likely an Agent/apply-edits pipeline / encoding handling issue (possibly server-side) rather than prompt guidance.

For now, the only viable workaround we’ve found is rolling back to Cursor 2.3.41 and disabling auto-update for legacy Windows-1252 codebases, until a proper fix lands. If you hear of any public issue/tracker link or a build that specifically mentions encoding preservation for Agent edits, please share — I’m happy to retest immediately.

Thanks again!