The rogue like behavior thing was me. BTW, gpt-5-fast has worked pretty well for me. Its worth a try, I think!
First, FWIW, I have a lot of rules now. Over time, I’ve bene building up a ruleset, and its about 22 or 23 .mdc files now. Every time I encounter real problematic issues, I have Cursor write a rule for me. It does help keep the agents corralled, or at least better than not having them. Still, I had probably 18 rules on Saturday, and man, the models were still running rogue…
I definitely had some VERY ROGUE behavior, well I guess it was both Friday and Saturday. Saturday had other issues with “conversation length too long” eventually preventing me from using the agent at all with Auto (don’t know if that issue is fixed yet, haven’t heard it was.)
But GPT-5 is not the only one that I thought was going rogue a lot on me. I call Gemini 2.5 “The Bulldozer” because IT, loves to go rogue. Now, when using GPT-5, I had some dangerous commands crop up, things that would have cost me data, code, etc. They did not run (thank you allow lists for terminal commands!), but they scared the ■■■■ out of me the first couple of times before I realized they had not actually run. The first one, I thought I’d lost my entire git workspace (the only real at-risk code when using git!) due to the model (I think it was GPT-5) trying to run git commit -- src/
(Almost ALL of my changes were in the src/ directory at the time!) I stopped using the default gpt-5 model after that, and went to sonnet for a bit. I then heard people having good results with gpt-5-fast and tried that (so far, so good…
)
Gemini, at least so far, hasn’t done anything outright heart-stopping like THAT, but, it is a freakin bulldozer. It just likes to ‘doze over my codebase, most of the time I use it. It seems hyper-opinionated, and WILL do things ITS WAY, or bust. It doesn’t like Claude 4 Sonnet code, and just wants to change it. All the time. So, I stopped using Gemini to code. I am fine creating plans with it, it works well there, but I DO NOT let it change code most of the time. I suspect, that there was a period of time I was on Auto mode on Saturday…it may well have been Gemini that did some of the rogue code changes, but you can’t know for sure anymore with Auto mode (at least, with one of the recent updates, it seemed like the agent would not let model-query type prompts through and would just respond with a canned message…not sure if that’s still in place.)
But yeah, I have experienced too much rogue behavior from models. I generally use C4S for coding, maybe now it will be C4S and GPT5Fast, Gemini for planning (although GPT-5 Fast plans really well, too), GPT4x for simpler questions and basic research. I’ve used Gemini for research a lot, will probably use GPT-5 for research as well. I kind of have a tough time with Auto mode. I would like to use it, but, it just includes ALL models, you can’t exclude any, so you just never know, when Auto-mode requests are gonna bulldoze your codebase. Would be nice to have some exclusions in there, so when you KNOW a model is just NOT suited to coding, you can prevent auto from using them. Then maybe, it might fall back on something a little slower, say Kimi K2, but IMO that would be preferable to having Gemini stomp all over your code and tell you it ■■■■■… 