Inability to work with files with smart/curly quotes/apostrophes

I have some source files that contain a lot of smart/curly quotes; where it’s presentational text where it’s weird to use the foot/inch marks commonly used as quotes in ASCII.

It is impossible to work with those files in Cursor and Sonnet 3.5. It seems that Sonnet 3.5 can understand that there are smart quotes in the text but it does not seem able to output smart quotes so any Composer edit to a file results in all of the smart quotes becoming inch/foot marks. This often makes a file syntactically invalid as it will change a line such as const text = 'We can’t do this' into const text = 'We can't do this'.

I’ve tried this directly on claude.ai and it makes the same mistake as in chat (see images below). I also tried chatting with OpenAI models which didn’t seem to exhibit the same issue.

I sense that this is really a bug at Anthropic’s end, although I do wonder if this could be patched at the Cursor end somehow, at least for lines that are not really being modified other than the quotes changing.

Either way it makes it pretty much impossible to use Cursor to work with large chunks of our codebases with the Sonnet model that seems otherwise to have the best results.

Version: 0.45.4
VSCode Version: 1.96.2
Commit: d9f8a232158c173cb84b31a70a49a9689bf0f770
Date: 2025-01-26T07:23:35.719Z
Electron: 32.2.6
Chromium: 128.0.6613.186
Node.js: 20.18.1
V8: 12.8.374.38-electron.0
OS: Darwin arm64 24.2.0

3 Likes

Hey, can you try adding a .cursorrules file, and specify the quote you want in there? This might be enough to get the AI to start outputting the quote mark you want!

1 Like

I experience this issue a lot as well.

Sometimes, when I’m making a completely unrelated change to the file, the composer goes through and edits all instances of curly (typographic) apostrophes within the file, even when the task composer is asked to complete something unrelated to this section of the file.

e.g. I ask it to update “section 2” of a HTML file (unrelated to apostrophes), and it will go through and update all of my typographic apostrophes in “section 1” of the file as well as making my requested edits to “section 2”.

I haven’t been able to overcome this completely, but I have minimised the issue by adding the following to my Rules for AI:

  • ALL user-facing text MUST use the typographic apostrophe ’ instead of ’ (e.g., it’s instead of it’s, we’ll instead of we’ll).
  • If a typographic apostrophe exists within a word, it MUST be retained.
1 Like

Thanks for replying Dan.

I can’t get it to do it when explicitly asked in a chat window, so don’t believe that a cursor rules file will fix this. I’ll give it a shot though with @jake’s suggestion.

The problem as I understand it is that the Sonnet 3.5 model is literally unable to output a smart quote - even when asked to do only that (see the last screenshot where it describes the two different types of quote and puts the straight quotes as example characters for both cases).

It can “see” them, but it literally can’t “say” them. Perhaps the clue is in the phrase “typographic” apostrophes.

I think you might be right that it’s literally incapable of doing it. And that actually makes a lot of sense.

Although my testing in .cursorrules has resulted in improvements, these improvements are from Claude not editing parts of the file that it’s not supposed to, i.e. no longer swapping existing straight apostrophes to typographic apostrophes in other parts of the file that are outside what I’m trying to edit.

But when the edit does involve part of the code with a typographic apostrophe - for example, getting it to say “we’ll be back soon” on the front end - I haven’t had any luck.

(knowing this is actually consoling, as I was beginning to doubt my prompt engineering skills).

1 Like

Oddly enough, I’m getting different results from using the LLM UIs directly vs Cursor.

With Claude, it returns straight quotes via the UI or via Cursor. But with OpenAI, it returns curly quotes via the ChatGPT UI (and in it’s response in the composer window), but then can’t apply those changes.

So it might be an issue with Cursor and Claude.

Tried @jake’s Rules for AI but it doesn’t seem to have any effect for me. Cursor cannot handle making a one-line edit to a file that has smart quotes in it (even when that one line doesn’t have smart quotes). I’m on Cursor 0.47.5 FWIW.

Confirming that this is not a Cursor issue. This is an Anthropic issue.

If you change your model to o3-mini, it works. The issue is that Claude literally cannot type a typographic quotation mark, as mentioned in the OP.

You can test this using Claude’s API directly. You can also use gpt-4o or o3-mini in Cursor to confirm that it works.

Thanks for the clarification and workaround. I was able to find an acknowledgement of this issue from an Anthropic employee in October 2024 who said, “Unfortunately, this is not something we’re able to fix in the near future”.

He recommended using a numerical character reference. I’ve got ~100 single right quotation marks, , in my codebase. I’m considering just replacing them all with ’ or ’, its HTML entity, so I don’t have to worry about changing my model around when touching files with them.

Not quite sure how things are working under the hood but I wonder if Cursor could convert smart quotes to their HTML entities before passing them to Anthropic and then convert the HTML entities back to smart quotes in the response (but only those that were auto-converted of course in case someone happened to have both ’ and in their code).

1 Like

FWIW I now have a python script that does it best to undo quote only edits. This has made it much more bearable to work in a codebase that has these.

Here if useful inspiration (it relies on one dependency, unidiff): Script to undo smart -> simple quote changes in git working tree · GitHub

I have also noticed today that it seems to do similar things with em-dashes and hyphens.

I’ve tried to tell Cursor about this script, asking it to use it if it notices that it’s unintentionally changed the quote types but I’ve not convinced it to yet.

I think the proper solution here is for Anthropic to revise how they do their tokenisation, but I suspect that’s quite deeply embedded in their models. However I suspect that their apply model could be designed to handle this much better.