I just tested the GPT 5 high reasoning models. I feel that this model has enough intelligence to be on par with Claude 4 Sonnet.
But there is one weakness that any model has to give up on Claude, even GPT 5 high. That is the ability to follow the rule, I don’t know how the Agents in Cursor work, but when the Agent says “I will call back ai_interaction at the end of the response” it doesn’t do it, it completely stops responding and doesn’t call the tool even though it remembers the rule and has talked about calling the tool.
As for Claude (all models), it seems that Claude is designed to follow the rule for a long time without “forgetting”, when Claude says “I will call back ai_interacation” it will do so, even if it doesn’t say “I will…” it will still call back ai_interaction at the end of each response to maintain the toolcall chat channel through user rules.
Both Grok 4 and GPT 5 are inferior to the Claude series.
I am only looking at the rule compliance aspect, but if we look at the aspect of sending each request for normal chat, I think other Agents are not inferior to Claude. But I appreciate Claude because it has the strongest rule compliance mechanism. What do you think?
