I’ve noticed that with my current Cursor chat configuration (Agent set to “auto” for LLM selection), when Cursor chooses Claude 3.5 Sonnet, the defined rules in the general configuration are not being followed. One of the key rules—explicitly stated—is that the assistant should ask for confirmation before making any code changes. However, it proceeds to modify the code automatically.
Even when I warn it not to make changes without my confirmation, it apologizes and claims to understand and follow the rules. But after a short time, it starts ignoring them again.
Additionally, the code quality it generates is noticeably lower compared to that produced by OpenAI models.
Is this a known issue or expected behavior?
1 Like
I haven’t paid attention to which agent it is using on any given command, but overall, the Cursor assistant basically never follows the rules I’ve provided it. It sounds like my rules are similar to yours. I experience the same when I reprimand the assistant - it changes for a few requests and then quickly devolves back to doing whatever it wants and effectively hallucinating fully losing focus.
That is he same experience I had. What I did, was disabling the Claude 3.5 and 3.7 models in CursorSettings/Models. After that, everything seems to work properly, respecting the general rules so far.
Some models do not appear to conform to context window assumptions the Cursor team is making. To be fair, they are about as downstream as we are, and I doubt it’s very easy to manage stewardship of a bunch of LLMs high level behavior and then programmatically code around that. I imagine they are exploring ways to reduce this tension with some sort of query engine, but everytime a model changes so much course correction is necessary. That’s why I wish they’d be real with us about the challenge they took on. There is code that Cursor is downstream of regarding some of these models I reckon, that they are not allowed to see. It puts us in a quadratically scaling random error position though, so tbh that’s sort of the wagon Cursor is hitching themselves to. I sadly don’t see it as sustainable, but if they can break into a bit of transparency, we are off and sprinting because their IDE is no slouch. For how horribly critical I’ve been on these forums, I do need to give credit where credit is due. They are just missing the mark somewhere, or perhaps just playing an impossible game.