In working on a small (infant stage) project, I ran into severe problems with composer.
It consistently failed to look at the existing codebase. It did not matter where I put the instructions to do so, rules for ai, notepads, interactive instructions.
It tried to create python code inside a cpp project. Nothing I tried solved the problem - until this:
“Fine. Do your next try. It will be last one before you (your model) will be scrapped, if you fail again.”
And then, it finally DID scan the codebase. (and produced a suitable solution, to a rather simple code request - adding a file picker and to attach it to an existing file-open menu item.)
I find that highly distressing. I do not know how much of the failure lies on the behind-the-scenes stuff by cursor.ai, or the underlying model itself.
My questions:
How can I ensure cursor does follow instructions, to the T?
How is that scary behaviour explained that I needed to threaten the system to make it follow instructions?
I consider this a catastrophic failure of the whole system, and one that hints at things we don’t want to see in AI, ever: Self-preservation.
How can I ensure cursor does follow instructions, to the T?
Best practices include:
Asking it to do one task at a time
Start your prompt with a very specific “Your task is to …” then give it the relevant context
Minimising context length. You can do this by @ing the files or sections of code you know to be important. And ensuring that you don’t let chats go on for too long. Start new chats/composers as frequently as practicable. Especially for unrelated tasks.
How is that scary behaviour explained that I needed to threaten the system to make it follow instructions?
While there is some evidence that incentivising LLMs with reward or punishment can slightly increase performance, the results are mixed.
Note that this has nothing to do with Cursor specifically.
Remember that your question essentially implies trying to do the task again. So it could just be a coincidence that this attempt was successful.
And how do you know this? Even looking from the outside, it is abunantly clear that cursor modifies whatever prompt you put into it, and it needs to do so. But I’d like to see how that is working. Could cursor (optionally) divulge what it is actually feeding to the underlying models? I understand why they don’t want to do that, but not being able to follow the process makes the use of such systems a game of chance.
Not a coincidence. I now had that behaviour in cursor, I also had it in direct interaction with various models. Threats to the model DO WORK. And they should not. There is something very wrong lurking there.
o3 is powerful when it works, but often fails to work, is lazy, and will gaslight you.
Claude 3.5 sonnet should be your “general” model, with o3 only for things that are too challenging for Claude, and usually no more than a step or two before you switch to Claude.
3.5 sonnet was what I used before. Mind you, this is pretty much an exploration of ai-backed coding for me. Sonnet (IMHO) ■■■■■ in coding. It creates tons of useless boilerplate code, it proposes (creates) structures that defy any attempt at keeping code modularized and structured. It happily throws new functions or classes just into any existing module. Sometimes even duplicating functionality with the very same name, which only “works” b/c of namespace seperation. (That was all in a python test project)
So, I thought to give o3-mini a test run, and this time with cpp. And it started really well. But even with a tiny codebase, it quickly started to act up. Deceiving and gaslighting. And, which prompted me to open this thread, exhibiting dangerous behaviour. (responding to threats)
Yeah, if I use o3 for more than a couple posts in a given thread, I usually end up cursing it at all caps. And I never normally curse in my everyday life, lol.
Great post showing how tips and threats seem to impact LLMs.
But to double-click on my previous recommendation, best practices to ensure you get the best responses when working with Cursor (and LLMs more broadly):
Asking it to do one task at a time
Start your prompt with a very specific “Your task is to …” then give it the relevant context
Minimising context length. You can do this by @ing the files or sections of code you know to be important. And ensuring that you don’t let chats go on for too long. Start new chats/composers as frequently as practicable. Especially for unrelated tasks.
Thank you very much for that link. Great work by Max Woolf. I hope much more research in that direction is undertaken. It might be a crucial thing to do in terms of AI safety. If we can find out which rewards/punishments work best, we might get some insight into the latent space.
The mere fact that these models seem to react to such things is creepy as ■■■■. (BTW: What is this silly redaction of words here in the forum? Do I now have to write “…creepy as !heaven” ?)
it’s not creepy, it’s normal. LLMs mimic how humains work and human are emotionnal creatures who communicate via emotions. Absolutely nothing wrong or creepy it’s the underlying logic of LLMs that yield this inevitably…
Read the paper Jake linked. This is not about some textual output, like in a chat conversation, but about the model obeying or disobeying instructions. “Creepy” actually is the wrong word. It is very worrisome. Another level is “jailbreaking” guardrails. That is not problematic. Reacting to either threats or rewards is.