GPT 5 All Models Review

I just tested the GPT 5 high reasoning models. I feel that this model has enough intelligence to be on par with Claude 4 Sonnet.
But there is one weakness that any model has to give up on Claude, even GPT 5 high. That is the ability to follow the rule, I don’t know how the Agents in Cursor work, but when the Agent says “I will call back ai_interaction at the end of the response” it doesn’t do it, it completely stops responding and doesn’t call the tool even though it remembers the rule and has talked about calling the tool.
As for Claude (all models), it seems that Claude is designed to follow the rule for a long time without “forgetting”, when Claude says “I will call back ai_interacation” it will do so, even if it doesn’t say “I will…” it will still call back ai_interaction at the end of each response to maintain the toolcall chat channel through user rules.
Both Grok 4 and GPT 5 are inferior to the Claude series.

I am only looking at the rule compliance aspect, but if we look at the aspect of sending each request for normal chat, I think other Agents are not inferior to Claude. But I appreciate Claude because it has the strongest rule compliance mechanism. What do you think?

5 Likes

I find GPT-5 to be painfully slow, it works for ages and give ok results. I’m already back to Claude and Gemini. I’ll take a pretty-good answer thats fast over a slightly-better answer that takes 10X as long to produce.

1 Like

Agreed that gpt-5 (default) feels much slower than sonnet 4. This is mostly due to long thinking between tool calls. But this feels like an easy fix: Code output should be the result of long thinking, but calls between tools should require less thinking. On the cursor end, you should tweak this balance, rather than applying medium/high reasoning at every chance to think

1 Like

Cloude still wining

2 Likes

Have you asked it why it doesn’t follow rules and what things factor into this decision?

1 Like

The AI model doesn’t really know why it follows or doesn’t follow the rules. As I said, GPT says “I will…” but it doesn’t do so at the end of its response, I created a rule that the Agent always has to call the AI-interaction tool at the end of each response, Claude and Gemini follow it very well, but Grok 4 and GPT 5 are different → even though the Agent says “I will…” it doesn’t call the ai_interaction at the end of that response, I also don’t understand why the Agent doesn’t call the tool while it actually says it will call the tool. Maybe it’s because of intelligence?

imo we were doing better without GPT-5.

From my experience, every time I asked, it specified exactly which rules it should have followed, why it didn’t follow them, and how to resolve the contravention.

1 Like

I did the same thing as you. You know how I created user rules for the AI-interaction tool I posted on the forum? Even though the Agent tells you why it doesn’t comply and how to fix it, it’s still the same, you should test a lot, it might work a little but it can’t be 100% effective. Even Claude only complies with the rule 95%, there will be a small percentage that makes Claude “forget” the rule, but it’s insignificant, very rare. Claude is still the best, right?

cursor reached out to me to provide feedback, I gave email feedback. nows its another 2 days, and my final feedback is as follows…

GPT-5 was amazing at first. it wrote extended length code which I wished models did a year ago. but I feel like we are past that.. and we want linting and smart choices. linting in gpt5 worked great in cursor, what we always wanted from cursor, cursor has done THEIR part 100% correct - but cursor is covering a lot here for gpt5’s underlying choices.

gpt5 started off making smart choices, but as the popularity grew, the speed dropped.

the speed dropped so much that I actually HATE using gpt5 now.

to be a leader you need SPEED + SMART

with claude-4-sonnet-thinking - if I make a mistake and tell it to do something, I see it thinking about how stupid I am, but it still goes ahead and does what I ask.. which is good right. I see its thoughts, and I’m kinda like, yeah maybe claude was right. so I change it back. thats what a model should be… working alongside you, for better or for worse. a partnership.

now with gpt5 it does NOT feel like a partnership… i feel like I am having to put in more work than gpt5 is putting in, I have to baby it. I have to wait for it, I have to wait for 4 minutes at a time wondering what it is doing, I try to go to faster gpt5 versions, same thing there. It seems to “TRY TOO HARD” it can’t make a decision because its trying so hard, i see it having this too and from with itself.. I’m like - COME ON - just make a decision, either/or will be fine.

If GPT5 was a boyfriend, it would be dumped within a week.

so here we are… sorry GPT5 you are dumped… you tried, you tried too hard, you were too unreliable in your arrivals, and when you did arrive, you often were not coherent in your context selection choices.

signed unhappy girlfriend.

1 Like

I can totally agree - GPT-5 is trying too hard to not make a mistake. You have to prompt it extremely precisely to get a good answer. You have to think about the architectural choices and do the creative work, GPT-5 just writes the code - mayhaps better than older models, but only does what you explicitely ask it to do. It’s more like a translator from human language to a programming language rather than a programmer’s assistant.