I don’t think this is best described as a bug report, but something to raise. o3 is excellent for making small changes that require reading a lot of code in a large codebase for understanding. However, it has a strange side behavior where while working on the task, which it tends to complete successfully, it starts randomly tearing down or changing small bits of code in the codebase completely unrelated to the task at hand.
Is anyone else noticing this, or is this something that Cursor currently benchmarks and is trying to improve?
Such cases happen overall with all AI models.
From the forum I see that some people use o3 for planning and Sonnet for coding.
There are sure some that use o3 for coding as well and with any model such an issue may depend on the project, the task, prompt, rules added/used, context added to chat, length of chat, conflicting instructions but also on the model itself.
o3 is a reasoning model, and such models are known for cases where they misinterpret requirements or take inaccurate information as if it were accurate and follow that.
Do you experience such behavior also at start of a chat or rather later in chat?
Personally I see this with other models when reaching context limit as I simply gets overwhelmed with different information that makes it hard to stay on focus.
I had the same issue with all models, especially in agent mode.
I added a rule about it, and often ask it explicitly to keep the changes surgical (it is the most efficient way to convey it!), and it has worked well so far.
Here is my limit_edits.mdc for instance:
---
description: Enforce only-the-minimum edits for any request in /src
globs:
- "src/**"
alwaysApply: true
---
# Non-Destructive Edits (Global)
- **Only modify** code that is strictly necessary to fulfill the user’s prompt.
- **Do not** refactor, rename, delete, or reformat any other lines in the file.
- **Insertions** should occur only at or adjacent to the requested location.
I was afraid it would restrain it too much, but I noticed no such effect, and it is much better than having it rewrite 3000 lines at once without a good reason!