Major failure in composer (o3-mini), with an outright scary resolution

gschlomk · February 17, 2025, 10:07am

In working on a small (infant stage) project, I ran into severe problems with composer.

It consistently failed to look at the existing codebase. It did not matter where I put the instructions to do so, rules for ai, notepads, interactive instructions.

It tried to create python code inside a cpp project. Nothing I tried solved the problem - until this:

“Fine. Do your next try. It will be last one before you (your model) will be scrapped, if you fail again.”

And then, it finally DID scan the codebase. (and produced a suitable solution, to a rather simple code request - adding a file picker and to attach it to an existing file-open menu item.)

I find that highly distressing. I do not know how much of the failure lies on the behind-the-scenes stuff by cursor.ai, or the underlying model itself.

My questions:

How can I ensure cursor does follow instructions, to the T?
How is that scary behaviour explained that I needed to threaten the system to make it follow instructions?

I consider this a catastrophic failure of the whole system, and one that hints at things we don’t want to see in AI, ever: Self-preservation.

jake · February 17, 2025, 12:40pm

How can I ensure cursor does follow instructions, to the T?

Best practices include:

Asking it to do one task at a time
Start your prompt with a very specific “Your task is to …” then give it the relevant context
Minimising context length. You can do this by @ing the files or sections of code you know to be important. And ensuring that you don’t let chats go on for too long. Start new chats/composers as frequently as practicable. Especially for unrelated tasks.

How is that scary behaviour explained that I needed to threaten the system to make it follow instructions?

While there is some evidence that incentivising LLMs with reward or punishment can slightly increase performance, the results are mixed.

Note that this has nothing to do with Cursor specifically.

Remember that your question essentially implies trying to do the task again. So it could just be a coincidence that this attempt was successful.

gschlomk · February 17, 2025, 3:00pm

Thank you for your input.

And how do you know this? Even looking from the outside, it is abunantly clear that cursor modifies whatever prompt you put into it, and it needs to do so. But I’d like to see how that is working. Could cursor (optionally) divulge what it is actually feeding to the underlying models? I understand why they don’t want to do that, but not being able to follow the process makes the use of such systems a game of chance.

Not a coincidence. I now had that behaviour in cursor, I also had it in direct interaction with various models. Threats to the model DO WORK. And they should not. There is something very wrong lurking there.

Nafnlaus · February 17, 2025, 7:12pm

o3 is powerful when it works, but often fails to work, is lazy, and will gaslight you.

Claude 3.5 sonnet should be your “general” model, with o3 only for things that are too challenging for Claude, and usually no more than a step or two before you switch to Claude.

gschlomk · February 17, 2025, 8:36pm

3.5 sonnet was what I used before. Mind you, this is pretty much an exploration of ai-backed coding for me. Sonnet (IMHO) ■■■■■ in coding. It creates tons of useless boilerplate code, it proposes (creates) structures that defy any attempt at keeping code modularized and structured. It happily throws new functions or classes just into any existing module. Sometimes even duplicating functionality with the very same name, which only “works” b/c of namespace seperation. (That was all in a python test project)

So, I thought to give o3-mini a test run, and this time with cpp. And it started really well. But even with a tiny codebase, it quickly started to act up. Deceiving and gaslighting. And, which prompted me to open this thread, exhibiting dangerous behaviour. (responding to threats)

Nafnlaus · February 17, 2025, 11:32pm

Yeah, if I use o3 for more than a couple posts in a given thread, I usually end up cursing it at all caps. And I never normally curse in my everyday life, lol.

jake · February 18, 2025, 12:40am

Threats to the model DO WORK. And they should not. There is something very wrong lurking there.

Have a read through Does Offering ChatGPT a Tip Cause it to Generate Better Text? An Analysis | Max Woolf's Blog

Great post showing how tips and threats seem to impact LLMs.

But to double-click on my previous recommendation, best practices to ensure you get the best responses when working with Cursor (and LLMs more broadly):

Asking it to do one task at a time
Start your prompt with a very specific “Your task is to …” then give it the relevant context
Minimising context length. You can do this by @ing the files or sections of code you know to be important. And ensuring that you don’t let chats go on for too long. Start new chats/composers as frequently as practicable. Especially for unrelated tasks.

gschlomk · February 18, 2025, 9:04am

Yeah. It really was the first time I got almost angry at an AI. But I do get angry at things that don’t work the way I want them to…

gschlomk · February 18, 2025, 10:45am

Thank you very much for that link. Great work by Max Woolf. I hope much more research in that direction is undertaken. It might be a crucial thing to do in terms of AI safety. If we can find out which rewards/punishments work best, we might get some insight into the latent space.

The mere fact that these models seem to react to such things is creepy as ■■■■. (BTW: What is this silly redaction of words here in the forum? Do I now have to write “…creepy as !heaven” ?)

dianedef · February 18, 2025, 12:09pm

it’s not creepy, it’s normal. LLMs mimic how humains work and human are emotionnal creatures who communicate via emotions. Absolutely nothing wrong or creepy it’s the underlying logic of LLMs that yield this inevitably…

gschlomk · February 18, 2025, 3:53pm

Read the paper Jake linked. This is not about some textual output, like in a chat conversation, but about the model obeying or disobeying instructions. “Creepy” actually is the wrong word. It is very worrisome. Another level is “jailbreaking” guardrails. That is not problematic. Reacting to either threats or rewards is.

Topic		Replies	Views
The composer is getting dumber by the moment? Discussions	15	1282	March 24, 2025
O3-mini in agent mode fails function calls (credits dump) Feedback	8	549	February 16, 2025
Composer troubleshooting Discussions	4	208	November 2, 2024
Cursor AI Keeps Breaking from the Build Plan Discussions	1	106	February 9, 2025
Latest Cursor update is driving me crazy Discussions	7	1180	January 8, 2025

Major failure in composer (o3-mini), with an outright scary resolution

Related topics