This has got to be the most frustrating part of Cursor:
If we can’t trust it to use our rules how can we rely on any output?
Is there some trick I can implement to make this more reliable?
This has got to be the most frustrating part of Cursor:
If we can’t trust it to use our rules how can we rely on any output?
Is there some trick I can implement to make this more reliable?
No trick that I have found, in fact what I have found is it consistently uses the greeting in my rules and ignores the rest. Seems like it’s part of the smoke and mirrors game it plays to make you think its using the rules. I can confirm, it doesn’t use 90+% of them on a consistent basis. I have reminded it about the rules and a message or two later it does the same thing. I don’t even remind it anymore, it just wastes time and tokens.
As far as you being right, it’s been my experience your always right. This thing will absolutely lie to you and hide things from you. I tried telling it to stop developing simulations in my code several times in one conversation and it agreed, said it wouldn’t do it anymore, tried to sneak it in by using different terminology until it just stopped mentioning anything about it and did it anyway.
I’ve also noticed this happening a lot. It seems most of the time lately the models totally ignore any rules I’ve set for the project or globally. Occasionally I’ve noticed “Auto” will select a model that actually obeys the provided rules (pretty sure it’s selecting GPT4o) but otherwise it seems like the rules are just getting lost in the sauce.
I don’t know if this is purely something that is the fault of the model itself/model creator (like Anthropic, which would likely be outside of the Cursor team’s control) or if it has something to do with the system prompt.
Asking the model to tell me what rules it knows about always produces the correct set of rules, but a lot of them (especially Claude 3.5 and 3.7 it seems) tend to totally ignore the rules and if I question the model about it, it usually states that it sees the rules and should have followed them similar to the response in the screenshot.
I have found a few things (as of cursor 0.47) with sonnet 3.5 and 3.7:
At the end of the day I think this comes down to the nondeterministic nature of LLMs and even though all the rules are getting added to the context, its still up to the model to keep all of them straight and know how to apply them.
Would be good if someone from the cursor team could confirm all of this - but that is what I think is happening - and maybe over time it will get more consistant - but overall with the break down of the 4 rule types, I have found it to be drastically better than when it was a single root project file.
Going back to when it was 1 giant root file - that was always being loaded into every context, and even then it would not always remember to use ever rule from the file when it got large always.
Put together a video also breaking all of this down in more detail as mentioned in another thread.
Someone I know, attempted to jailbreak the system rules which Cursor pass in alongside our rules. I don’t know how accurate it was, there was things about it being concise and reducing tokens. But interestingly, it did also give it leeway to not always follow the request to the letter. I assume this is to help prevent invalid loops in thinking.
I’d like more visibility on the prompt that Cursor are using, or people to experiment with telling it to ignore everything it was told previously.
I was thinking about that today also - there is something with cursor where even when it picks up the rule file, sometimes the LLM seems to randomly choose from within the same file some rules to follow and some not. Of course everything with the LLM is nondeterministic - but sometimes what it decides to skip has me really wondering why - and has me leaning towards either the prompt cursor is using with the rules as you suggest, or maybe its doing some reduction or summarization of the rules before passing them on… as its a black box once it leave our ide, its hard to tell…
may relate to #Please make "thinking toggle" visible
I have determined through trial-and-error (and asking models about the contents of their prompts) that there is a local model deciding which rules are important enough to include.
Having “Always” selected is not enough, if you have more than one “Always” rule.
Once I added “very-important” or “must-be-included” to the FILENAME (!), then all are reliably included.
As for rule-following, that’s a different story, and depends on the model, how your rules are written, the total context size, etc.
You really should have a bunch of .mdc files and tag which ever ones are needed. Until the day comes when we can have a 100x the context window, we will have to continue to tag each rule and make double check the agents work