Gemini 2.5 Pro + Claude 4 Sonnet Thinking > Grok 4 >? GPT-5 >> Auto > GPT-5 Mini > Sonic
Grok 4 broken - I like his approach, but he is always unable to continue chat.
Gemini 2.5 Pro less broken, then Grok 4 - It can also interrupt execution in the middle of a task, like Grok, but less often.
Claude Sonnet 4 Thinking very good when he has enough intelligence to do the task, but expensive. Very good as QA Engineer. I usually use it when other models have ■■■■■■ me off with their stupidity or when I’m fed up with the errors in Cursor Agent Chat.
I tried Claude Opus 4.1 Thinking and it was absurdly expensive considering what I got in return.
GPT-5: I’m still trying it out. I switch between GPT-5-high and GPT-5-low.
Sonic (Grok 4 Coder Lite?) is lightning dumb but fast. I really hope that the unnamed provider gave a lightweight version of the neural network, and not a full one.
I previously recommended o4-mini as a budget option. Now it seems slow, and it’s also pretty lazy. So for very simple tasks, use Auto, GPT-5 Mini, or Sonic. Or GPT-5-low - it’s pretty cheap and powerful.
I also recommend giving my tools a try (though I’ve had to put them developing on hold for now ).
@condor Reddit say me that Reddit-version in r/Cursor was autodeleted by Reddit autofilters, even if I remove my Github links from it. Can you help me somehow?
I sent it to Reddit mods. They mentioned that autofilters are not managed by mods so this is automatic. Also it seems your account there was suspended which may be a reason why its not showing as well.
Edit: Moved Gemini to the top of the food chain. Still not 100% stable, but for complex tasks it’s the best in terms of cost/quality ratio.
Grok 4 also approaches tasks comprehensively and in some cases is better than Gemini, but he is chaotically good.
Gemini is good for writing code. For debugging too.
Claude is worse in code architecture. Perhaps I am not using a completely correct and complete term, but that is exactly the feeling.
Gemini sometimes breaks right in the middle of execution. Claude is absolutely stable.
The main thing: they have different approaches. When one works poorly, let the other one have a look.
Well, Claude is the least lazy of all the models. Sometimes this is bad. But if you intentionally launch it with this intent, it turns out great.
Grok 4 is sometimes smarter and can sometimes work for tens of minutes without interruption. But I can’t say that it’s better than Gemini or Claude. At least I couldn’t achieve such an effect, although I really hoped for Grok.
If you are a user of Agent Compass, then at the end of its work, Agent outputs a report. GPT-5 is lying in these reports, even if she has evidence that she is lying. Neither Gemini nor Claude allow themselves to do this.
My choice of models now: Gemini 2.5 Pro, GPT-5-high, Grok 4, Grok Coder Fast
Gemini 2.5 Pro and GPT-5-high: GPT-5-high is smarter, but Gemini less lazzy. I thought Gemini had a more complex approach because he’s smarter, but in reality, the GPT-5 is just a soulless machine that does only what it was asked to do in the last request. You can’t even give her hints on the way through the Send to Queue - she’ll just ignore the previous commands and do only what was in this short hint.
Grok 4 is expensive and still broken. Occasionally I run it for a different look at the problem.
Grok Coder Fast - I use it to quickly collect information about something in the repository. But sometimes he’s too dumb even for that. It can be replaced with Auto or GPT-5-mini (I haven’t tried -nano, as mini is already quite far behind in intelligence)
Claude Sonnet 4 Thinking - I like him, but he’s too expensive relative to everyone else.