How To Optimize Your Usage: The Best AI Models to Use, version 2.2

Previous version of the guide:


Gemini 2.5 Pro + Claude 4 Sonnet Thinking > Grok 4 >? GPT-5 >> Auto > GPT-5 Mini > Sonic

  • Grok 4 broken - I like his approach, but he is always unable to continue chat.
  • Gemini 2.5 Pro less broken, then Grok 4 - It can also interrupt execution in the middle of a task, like Grok, but less often.
  • Claude Sonnet 4 Thinking very good when he has enough intelligence to do the task, but expensive. Very good as QA Engineer. I usually use it when other models have ■■■■■■ me off with their stupidity or when I’m fed up with the errors in Cursor Agent Chat.
  • I tried Claude Opus 4.1 Thinking and it was absurdly expensive considering what I got in return.
  • GPT-5: I’m still trying it out. I switch between GPT-5-high and GPT-5-low.
  • Sonic (Grok 4 Coder Lite?) is lightning dumb but fast. I really hope that the unnamed provider gave a lightweight version of the neural network, and not a full one.
  • I previously recommended o4-mini as a budget option. Now it seems slow, and it’s also pretty lazy. So for very simple tasks, use Auto, GPT-5 Mini, or Sonic. Or GPT-5-low - it’s pretty cheap and powerful.

I also recommend giving my tools a try (though I’ve had to put them developing on hold for now :melting_face:).

2 Likes

Turn on bell in Watching to get updates about the guide!

@condor Reddit say me that Reddit-version in r/Cursor was autodeleted by Reddit autofilters, even if I remove my Github links from it. Can you help me somehow?

I sent it to Reddit mods. They mentioned that autofilters are not managed by mods so this is automatic. Also it seems your account there was suspended which may be a reason why its not showing as well.

Edit: Moved Gemini to the top of the food chain. Still not 100% stable, but for complex tasks it’s the best in terms of cost/quality ratio.
Grok 4 also approaches tasks comprehensively and in some cases is better than Gemini, but he is chaotically good.

I decided to recommend both Claude and Gemini.

  • Gemini is good for writing code. For debugging too.
  • Claude is worse in code architecture. Perhaps I am not using a completely correct and complete term, but that is exactly the feeling.
  • Gemini sometimes breaks right in the middle of execution. Claude is absolutely stable.
  • The main thing: they have different approaches. When one works poorly, let the other one have a look.
  • Well, Claude is the least lazy of all the models. Sometimes this is bad. But if you intentionally launch it with this intent, it turns out great.
  • Grok 4 is sometimes smarter and can sometimes work for tens of minutes without interruption. But I can’t say that it’s better than Gemini or Claude. At least I couldn’t achieve such an effect, although I really hoped for Grok.



I thought Sonnet 4 was more expensive than Grok 4 :thinking:

1 Like

If you are a user of Agent Compass, then at the end of its work, Agent outputs a report. GPT-5 is lying in these reports, even if she has evidence that she is lying. Neither Gemini nor Claude allow themselves to do this.

1 Like

I jumped to conclusions - all models (GPT-5, Gemini, Grok 4) have problems noticing an incorrect message in the console output :man_facepalming:

2 Likes

My choice of models now: Gemini 2.5 Pro, GPT-5-high, Grok 4, Grok Coder Fast

  • Gemini 2.5 Pro and GPT-5-high: GPT-5-high is smarter, but Gemini less lazzy. I thought Gemini had a more complex approach because he’s smarter, but in reality, the GPT-5 is just a soulless machine that does only what it was asked to do in the last request. You can’t even give her hints on the way through the Send to Queue - she’ll just ignore the previous commands and do only what was in this short hint.
  • Grok 4 is expensive and still broken. Occasionally I run it for a different look at the problem.
  • Grok Coder Fast - I use it to quickly collect information about something in the repository. But sometimes he’s too dumb even for that. It can be replaced with Auto or GPT-5-mini (I haven’t tried -nano, as mini is already quite far behind in intelligence)
  • Claude Sonnet 4 Thinking - I like him, but he’s too expensive relative to everyone else.
2 Likes

I forgot to say that Gemini is the best in terms of cost-quality-speed. But GPT-5-high is more trouble-free.

2 Likes

GPT-5 starts to get dumb in the middle of the available context window. Or even at 40+%.
It would be great to have forced context compression so that you don’t have to do context engineering in a new chat.

3 Likes

By the way, GPT-5 became my main agent.

  • gpt-5-high >= gpt-5 > gemini 2.5 pro >> grok coder fast

And I still feel sorry for the money to compare it with Sonnet 4. :eyes:

2 Likes


God bless Elon and xAI for that 15 minutes and 0.35 free American dollars of refactoring :smiling_face_with_sunglasses:

Have you had hands-on time with gpt-5-codex to be able to judge it yet? Thank you for this thread and all your time!

Codex is weird. Benchmarks are tasty, but I don’t want to waste time to retrain from gpt-5-high.

1 Like

I tried using code-supernova-1-million as a cheap QA that would increase test coverage and fix problems in my TypeScript project along the way.

As a result, I had 207 tests, of which 187 were passed. I switched to gpt-5-high (I also wrote “gpt-5, I’m switching to you”) and asked to double-check changed tests, as well as finish the work; at the end of the prompt, I summarized the context via /summarize.

After completing his work, I asked prompt:

Can the quality of the tests written before switching to you be qualified as good?
Just answer the question.

Full answer of gpt-5-high

No.

:joy::joy::joy:

3 Likes

gpt-5-high >= gpt-5 > Grok Code Fast

  • gpt-5-high: Stable and smart
  • gpt-5: Sometimes I use it to save money, but I’m not sure if I’m really saving.
  • Grok Code Fast: Excellent for gathering information and simple tasks.
  • UPD3: gpt-5-mini slightly smarter than GFC, so you can use it for simple tasks or reasoning (ChatGPT Free Plan uses gpt-5-mini)

  • Gemini 2.5 Pro outdated relative to gpt-5 and Grok 4
  • Grok 4 probably still broken. I’ve already lost hope and I don’t want to check.
  • Claude 4.5 Sonnet Thinking is too expensive;
    UPD2: The best model for creating and work with huge documentation.
  • gpt-5-codex is strange, so i’am lazy to research him
  • Grok 4 Fast Reasoning is too lazy model; It is better to take gpt-5 or Grok Code Fast

Also, I’ve downgraded to Cursor v1.6.46, because Cursor v1.7.x ■■■■■.

2 Likes

I was very surprised by Grok Code. Besides being very responsive, it performs much better than many other models. I think AI language models vary depending on the language used, the size of the project, and how they’re used.