File editing needs to be more robust

Where does the bug appear (feature/product)?

Cursor IDE

Describe the Bug

The provided tooling for editing files can’t handle multiple edits close to one another.

‘Failure’ cases involve the model trampling over its own edits.
‘Critical failure’ cases involve the model trampling over its own edits then declaring the file is in a state it isn’t.

The model should be able to consistently make fine-grained edits. The ‘context’ of edits should be robust to the model’s own edits (and ideally, the edits of a human interactively correcting those edits).

The models will often make multiple edits across a single file, so this happens all the time in real world codebases outside of these reproductions. This results in a massive amplification of tokens required, often causing the model to re-read entire files. The time cost is also greatly amplified, often resulting in multi-minute waits during which the editing experience is terrible, since the model is thrashing the file.

Of course since the context itself grows linearly, the resulting token bill increases quadratically. I upgraded to a Pro+ subscription just to do these tests!

To quote the model itself:

This is maddening. The application continues to undo my edits.

It’s difficult to debug this on the user end since the provided edits in the file card are nonsense, see File edit card is pretty broken

I’ve had this issue happen with every model I’ve tried in real projects. This editing problem is ultimately a fundamental issue with the file editing system design, rather than with any particular model.

Steps to Reproduce

I made sure to disable privacy mode for all these tests, I’ve been doing them over the past few days, and there have been several updates in that time but I’m still getting failures. It is also frustrating to not have any visibility of what changes version-to-version. When a version is released, the changelog should be public, not days or weeks later.

Reproduction 1

Any request to edit two lines close to each other with multiple tool calls causes this problem, I’ve been trying various prompts, this is one:

In two calls to edit_file, update the keys array to include capital letters, update the records = [] line to be a string array.

Here are a bunch of request IDs with their models and how they failed.

I’ve noted how many tool calls in brackets, and success, fail / critical fail as per above criteria.

gpt-5

2e5daefc-686a-4617-bf54-bfad7aa04b6d - fail (3)
0c24bcba-98c6-4a74-9f38-46f149312f2a - fail (9) over 5 minutes of real-time, 260.8K total tokens for a 2 line edit.
edd050f1-d99e-4edb-b045-b01bc7374eb8 - fail (12)
bd160abc-e3b4-44ba-83e7-611c99ecae2e - fail (12) many minutes, hundreds of thousands of tokens.

gemini-2.5-pro

27dfa513-a208-4fc0-9371-e9abdb8e7fe2 - critical fail (2)
cf0cc3ef-6b44-4ff1-bf9c-a22746d93bcd - fail (3)
530b658c-6670-4658-a204-27d8c9389f57 - success!

gemini-2.5-flash

3c80e9dc-afe1-4196-8c14-92c4fd47a010 - fail (5)
531665d2-5646-414d-82a6-4c2a187d9d4f - success!
595d8b1c-e9d9-4e62-9c5b-4d1f7bb6d35a - fail (4)

claude-3.5-haiku

7bf871c7-e872-4fbd-8f60-5ed598170a23 - critical fail (2)
16578b41-f30b-4e6b-b334-c347acf8a5d9 - critical fail (2)
cb5f7c09-a140-4f4e-b534-a5d683aa6c93 - critical fail (2)
06cb0903-48b4-41a7-ba63-1bee8b3f4884 - critical fail (2)


claude-4.5-sonnet explicitly reads the files in-between edits to keep consistent state.

b5e36bfe-6059-4832-ac46-18fd7c052604 - success
677b09b8-efc6-412f-9e81-a081aaaa9da9 - success
304db461-603a-4ba0-ae70-fdef7e8685c9 - success


claude-4.1-opus consistently succeeds without explicit reads in between by internally keeping track of the changes.

700fbf95-b52d-4c30-9a6d-c50bb7227375 - success
54e6047d-d4b1-43d4-b8e2-cdb2025e72e1 - success
5ee78087-1381-475c-bb0f-baf0b8b37865 - success


Reproduction 2

File

// minimal-test.ts

const a = "b";
const b = "b";
const c = "c";

Prompt

In two tool calls to edit_file, update 'a' to 'A', and update 'c' to 'C' in 'minimal-test.ts'.

claude-4.5-sonnet

b4a2a6c8-e3fd-4a3e-b046-5784f124226f - critical fail (2)
ad8131f3-0d13-4413-97b4-bc60fdbef138 - critical fail (4)

claude-4.1-opus

ad8131f3-0d13-4413-97b4-bc60fdbef138 - fail (3)
0067efe1-a2f8-41c1-a42c-3707e45a2712 - fail (5), used search_replace at the end

Expected Behavior

File editing should be robust.

Operating System

MacOS

Current Cursor Version (Menu → About Cursor → Copy)

Version: 1.7.39
VSCode Version: 1.99.3
Commit: a9c77ceae65b77ff772d6adfe05f24d8ebcb2790
Date: 2025-10-08T00:33:20.352Z
Electron: 34.5.8
Chromium: 132.0.6834.210
Node.js: 20.19.1
V8: 13.2.152.41-electron.0
OS: Darwin arm64 24.6.0

For AI issues: which model did you use?

I’ve attached request IDs for the following models, this happens across all models I’ve tried.

gpt-5
gemini-2.5-pro
gemini-2.5-flash
claude-3.5-haiku
claude-4.5-sonnet
claude-4.1-opus

Does this stop you from using Cursor

Yes - Cursor is unusable

Shrine to rampancy

1 Like

Thanks for the detailed report and request IDs. The team is already looking into related issue (#136229). Sometimes smaller models might have trouble with using tools correctly. But the larger models are usually good at this, this shouldn’t be happening. We’re looking into this.

To help debug this further, could you share:

  • Do you have any custom rules applied here or is it a clean context? I tried the test file with claude-4.5-sonnet which worked correctly in one go.

  • Any extensions or custom configurations that might affect agent behavior?

I have a couple rules, but they just describe how to run tests in the workspace, a desire not to repeat code summaries to the user in the chat, and a request not to ‘fix bugs found along the way’.

I have ast-grep enabled as an MCP server, other than that it’s a typical TypeScript IDE setup.


I have updated to the latest version (but there’s no changelog for it or the previous version).

Version: 1.7.40
VSCode Version: 1.99.3
Commit: df79b2380cd32922cad03529b0dc0c946c311850
Date: 2025-10-09T02:55:11.735Z
Electron: 34.5.8
Chromium: 132.0.6834.210
Node.js: 20.19.1
V8: 13.2.152.41-electron.0
OS: Darwin arm64 24.6.0

It passes these reproduction tests now. Thank you.

As of 1.7.44, while it passes the reproduction tests above, I’m finding real edits are still failing.

The model will sometimes insert // ... existing code ... into the actual diff:

If I have the following test file

And just ask the LLM to replace one of those blocks:

Replace

```
{
  // Class generic constraint, generic is factory type
  class Container<R extends () => Reporter<string>> {
    constructor(public reporterFactory: R) {}
  }

  new Container(() => new Reporter()); // does not error, inferred as Reporter<string>
  new Container<() => Reporter<string>>(() => new Reporter<null>()); // errors

  const reporter = new Reporter();
  new Container(() => reporter); // errors

  const reporterFactory = () => new Reporter();
  new Container(reporterFactory); // errors
}
```

with

```
// test comment
```

with the edit_file tool

The models consistently fail.

81ec806d-5ea3-48f1-994f-e89e1cbb5828 - gpt-5 critically failing
e4f73134-b60f-4ce7-9a59-46fda3820d73 - gemini-2.5-pro failing

claude-4.5-haiku seems to consistently fail with this, given a file and a list of replacements, it will create one tool call per replacement, and each replacement removes all the ones before, and it will think the task has been completed.

This topic was automatically closed 22 days after the last reply. New replies are no longer allowed.