Dean,
I didnât have any cursorignore or rules. This sometimes happens on brand new conversations, and sometimes long ones. I was mainly working on powershell scripts and markdown. Unfortunately I canât share the files (enterprise customer here).
Interestingly, I think I was able to root cause and fix this. I literally asked the agent âwhatâs keeping you from editing the files?â and it told me that my document had some non-breaking white space characters that were making StrReplace fail. I asked the agent (Opus 4.6 in this case) to find and fix it:
There it is â c2 a0 is a UTF-8 non-breaking space (U+00A0) between ## and Code. Thatâs why StrReplace canât match. Let me fix that first.
The agent started working normally after this.
I then asked Gemini to explain this issue in detail (1), and then to provide some rules so this doesnât happen again (2). I havenât seen the issue again, but I havenât done much yet. Iâll report if it comes back.
It would be very useful if this was something that cursor applies on its own as a rule, or maybe when the response is being formatted and applied to the codebase.
1
You said:
Iâm using agentic coding and the file edits keep failing due to some white space issues. Is there any tool I can use to find and replace things like UTF non breaking spaces and the like?
This is a common frustration with agentic coding. LLMs often âhallucinateâ whitespace or copy formatting from their training data (like web docs or PDFs) that includes invisible characters like Non-Breaking Spaces (NBSP) or Zero-Width Spaces (ZWSP). These characters look identical to regular spaces but cause syntax errors in almost every programming language.
Here are the best ways to find and destroy them:
1. The VS Code âNativeâ Way (Quickest Fix)
You donât necessarily need a new tool if you use VS Code. You can visualize and replace these globally using Regex.
- To See Them: Go to Settings and search for
Render Whitespace. Set it to all. Regular spaces appear as small dots; non-breaking spaces often appear as small "o"s or different shapes depending on your theme.
- To Find & Replace (Project-wide):
- Open the Global Search (
Ctrl+Shift+F or Cmd+Shift+F).
- Click the Use Regular Expression icon (the
.* button).
- In the search box, paste:
\x{00A0}|\x{200B}|\x{202F}
\x{00A0} is the Non-Breaking Space.
\x{200B} is the Zero-Width Space.
\x{202F} is the Narrow Non-Breaking Space.
- In the Replace box, type a single standard space.
- Run Replace All.
2. Specialized VS Code Extensions
If this happens frequently, these extensions can highlight âbadâ whitespace automatically:
- Gremlins: Highlights invisible characters and âlook-alikeâ characters (like a Greek Question Mark that looks like a semicolon) with a red underline.
- Fix Irregular Whitespace: Automatically converts these characters to normal spaces every time you save the file.
- [Invisible AI Character Detector: Specifically designed for text generated by LLMs to find hidden watermarks or formatting artifacts.
3. Command Line (Bulk Cleanup)
If you have a large project and want to clean everything at once, use a sed or perl one-liner in your terminal:
On Linux/macOS:
Bash# This finds the NBSP (octal 302 240) and replaces it with a normal space find . -type f -name "*.py" -print0 | xargs -0 sed -i 's/\xc2\xa0/ /g'
Using Python (Platform Independent):
You can run this quick script in your project root to sanitize all files:
`Pythonimport os
def clean_files():
for root, dirs, files in os.walk(â.â):
for file in files:
if file.endswith((â.pyâ, â.jsâ, â.tsâ, â.htmlâ, â.cssâ)): # add your extensions
path = os.path.join(root, file)
with open(path, ârbâ) as f:
content = f.read()
# Replace NBSP (utf-8 \xc2\xa0) with space
clean_content = content.replace(bâ\xc2\xa0â, bâ ')
if clean_content != content:
with open(path, âwbâ) as f:
f.write(clean_content)
print(f"Cleaned: {path}")
clean_files()`
4. Why is this happening?
Many LLMs use a âgreedyâ tokenization process. When they suggest code based on documentation snippets found on the web, they accidentally carry over the characters used in HTML layouts.
Pro-Tip: If you are using an agent like Aider or Claude Engineer, try adding a line to your .aider.conf.yml or your system prompt: âStrictly use standard ASCII whitespace (U+0020) and standard quotes. Do not use non-breaking spaces or smart quotes.â
Would you like me to write a more robust Python script that recursively cleans your entire directory and logs exactly which âinvisibleâ characters it finds?
2
To stop Cursor (or any agentic AI) from injecting those invisible âgremlinsâ into your code, you should add a specific directive to your .cursorrules file or your Global Rules for AI in the settings.
Here is a high-authority prompt designed to eliminate whitespace hallucinations and encoding artifacts:
The âClean Code & Whitespaceâ Rule
Copy and paste this into your .cursorrules file:
# WHITESPACE & ENCODING STANDARDS
- **Strict Whitespace:** Use ONLY standard ASCII space characters (U+0020) for indentation and spacing.
- **Prohibited Characters:** NEVER use Non-Breaking Spaces (U+00A0), Zero-Width Spaces (U+200B), or any other "invisible" Unicode characters.
- **Indentation:** Always use standard [4 spaces / 2 spaces / tabs - CHOOSE ONE] as per the existing project style. Do not mix indentation styles.
- **Smart Characters:** Never use "Smart Quotes" (â â) or em-dashes (â). Use only standard ASCII straight quotes (' or ") and hyphens (-).
- **Line Endings:** Use standard LF (Unix) line endings unless the project explicitly uses CRLF.
- **Verification:** Before outputting a code block, mentally "sanitize" the text to ensure no formatting artifacts from web-based documentation or internal training data are included.
Why this specific prompt works:
- Hex Codes: By including the specific Unicode hex codes (
U+00A0), you give the LLM a clear ânegative constraintâ that is harder for it to ignore than a general instruction like âdonât use weird spaces.â
- âSmartâ Character Ban: Often, when an LLM fails a file edit, itâs not just the spacesâitâs also âcurlyâ quotes or long dashes it picked up from a blog post or documentation site. This rule kills those too.
- Indentation Enforcement: Agentic tools often fail âsearch and replaceâ blocks because they guess the indentation wrong (e.g., they provide 3 spaces when the file uses 4). Forcing a choice in the prompt prevents this mismatch.
Pro-Tip for Cursor Users
If you find that the AI is still struggling with âSearch/Replaceâ failures, you can add this line to your rules as well:
âWhen providing code edits, ensure the context lines (the lines before and after the change) exactly match the whitespace and indentation of the original file to ensure the diff-engine can locate the target block.â
Would you like me to show you how to set this up as a permanent global rule in Cursorâs settings so you donât have to add it to every project?