This is a screenshot of an in-line suggestion. The suggested code already exists on the next two lines.
This problem started when I used the new Dart formatter which introduces line breaks where they did not previously exist. I have experimented with the OpenAI tokenizer which produced different results when the code is formatted differently. You can see further details on this and a subsequent discussion with Claude on this github issue.
opened 09:16AM - 15 Mar 25 UTC
closed 08:06PM - 26 Mar 25 UTC
Since the introduction of the new formatter I have noticed an issue with the inl… ine predictions made by my AI coding assistant. I am using Cursor IDE (v0.46.11). Where a new line break has been inserted by the formatter (see issue #1668) the coding assistant predicts text that already exists on subsequent line(s). Based on the assumption that whitespace would be irrelevant to LLMs my initial thought was that this may be a bug in Cursor. Here is an example in context (the grey text is the prediction).
<img width="612" alt="Image" src="https://github.com/user-attachments/assets/4c709387-f928-4911-8d0f-897567d9cfb3" />
I decided to verify my assumption about how an LLM might ignore whitespace when tokenizing differently formatted code. Using the [OpenAI tokenizer](https://platform.openai.com/tokenizer) I discovered that the code *is differently tokenized*.
<details><summary>Tokenization Differences</summary>
### Old Formatter
<img width="702" alt="Image" src="https://github.com/user-attachments/assets/ac1f4a84-47ff-475b-b6ec-618c4b0074f8" />
### New Formatter
<img width="707" alt="Image" src="https://github.com/user-attachments/assets/37486d6f-7b74-4946-81fc-626094081e32" />
</details>
This led me to do further research on the changes introduced with the new formatter #1253. A conversation with Claude Sonnet 3.7 resulted in the following conclusion (usual caveats regarding AI generated results apply). Link to full conversation with Claude at end of extract.
### Claude's Conclusion
"These changes directly explain your observation about prediction differences. With the new formatter:
* Lines are being broken differently
* Trailing commas are being added/removed automatically
* The visual structure of code has shifted toward tall style
When you observed the model predicting "text which was already present in my code but on the next line," this is likely because the new formatter is introducing line breaks where the old formatter wouldn't have, changing how the model perceives the context window.
Since coding assistants are sensitive to formatting patterns they've seen during training, ***these significant formatter changes would naturally lead to prediction misalignments until the models are updated with code formatted according to the new Dart 3.7 style guidelines.***" [Link to research](https://claude.ai/share/5c8381cb-f96a-431f-bc07-8a29f44ac608)
1 Like
system
Closed
April 27, 2025, 5:13pm
2
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.