How To Optimize Your Usage: The Best AI Models to Use, version 2.2

Artemonim · August 21, 2025, 12:00pm

Previous version of the guide:

Gemini 2.5 Pro + Claude 4 Sonnet Thinking > Grok 4 >? GPT-5 >> Auto > GPT-5 Mini > Sonic

Grok 4 broken - I like his approach, but he is always unable to continue chat.
Gemini 2.5 Pro less broken, then Grok 4 - It can also interrupt execution in the middle of a task, like Grok, but less often.
Claude Sonnet 4 Thinking very good when he has enough intelligence to do the task, but expensive. Very good as QA Engineer. I usually use it when other models have ■■■■■■ me off with their stupidity or when I’m fed up with the errors in Cursor Agent Chat.
I tried Claude Opus 4.1 Thinking and it was absurdly expensive considering what I got in return.
GPT-5: I’m still trying it out. I switch between GPT-5-high and GPT-5-low.
Sonic (Grok 4 Coder Lite?) is lightning dumb but fast. I really hope that the unnamed provider gave a lightweight version of the neural network, and not a full one.
I previously recommended o4-mini as a budget option. Now it seems slow, and it’s also pretty lazy. So for very simple tasks, use Auto, GPT-5 Mini, or Sonic. Or GPT-5-low - it’s pretty cheap and powerful.

I also recommend giving my tools a try (though I’ve had to put them developing on hold for now ).

Artemonim · August 21, 2025, 12:43pm

Turn on bell in Watching to get updates about the guide!

Artemonim · August 21, 2025, 12:57pm

@condor Reddit say me that Reddit-version in r/Cursor was autodeleted by Reddit autofilters, even if I remove my Github links from it. Can you help me somehow?

condor · August 21, 2025, 3:43pm

I sent it to Reddit mods. They mentioned that autofilters are not managed by mods so this is automatic. Also it seems your account there was suspended which may be a reason why its not showing as well.

Artemonim · August 22, 2025, 5:57pm

Edit: Moved Gemini to the top of the food chain. Still not 100% stable, but for complex tasks it’s the best in terms of cost/quality ratio.
Grok 4 also approaches tasks comprehensively and in some cases is better than Gemini, but he is chaotically good.

Artemonim · August 22, 2025, 8:02pm

I decided to recommend both Claude and Gemini.

Gemini is good for writing code. For debugging too.
Claude is worse in code architecture. Perhaps I am not using a completely correct and complete term, but that is exactly the feeling.
Gemini sometimes breaks right in the middle of execution. Claude is absolutely stable.
The main thing: they have different approaches. When one works poorly, let the other one have a look.
Well, Claude is the least lazy of all the models. Sometimes this is bad. But if you intentionally launch it with this intent, it turns out great.
Grok 4 is sometimes smarter and can sometimes work for tens of minutes without interruption. But I can’t say that it’s better than Gemini or Claude. At least I couldn’t achieve such an effect, although I really hoped for Grok.

Artemonim · August 22, 2025, 10:23pm

I thought Sonnet 4 was more expensive than Grok 4

Artemonim · August 25, 2025, 10:30am

If you are a user of Agent Compass, then at the end of its work, Agent outputs a report. GPT-5 is lying in these reports, even if she has evidence that she is lying. Neither Gemini nor Claude allow themselves to do this.

Artemonim · August 26, 2025, 10:40am

I jumped to conclusions - all models (GPT-5, Gemini, Grok 4) have problems noticing an incorrect message in the console output

leoing · September 3, 2025, 7:13am

Artemonim · September 8, 2025, 8:04pm

My choice of models now: Gemini 2.5 Pro, GPT-5-high, Grok 4, Grok Coder Fast

Gemini 2.5 Pro and GPT-5-high: GPT-5-high is smarter, but Gemini less lazzy. I thought Gemini had a more complex approach because he’s smarter, but in reality, the GPT-5 is just a soulless machine that does only what it was asked to do in the last request. You can’t even give her hints on the way through the Send to Queue - she’ll just ignore the previous commands and do only what was in this short hint.
Grok 4 is expensive and still broken. Occasionally I run it for a different look at the problem.
Grok Coder Fast - I use it to quickly collect information about something in the repository. But sometimes he’s too dumb even for that. It can be replaced with Auto or GPT-5-mini (I haven’t tried -nano, as mini is already quite far behind in intelligence)
Claude Sonnet 4 Thinking - I like him, but he’s too expensive relative to everyone else.

Artemonim · September 8, 2025, 8:08pm

I forgot to say that Gemini is the best in terms of cost-quality-speed. But GPT-5-high is more trouble-free.

Artemonim · September 20, 2025, 10:14pm

GPT-5 starts to get dumb in the middle of the available context window. Or even at 40+%.
It would be great to have forced context compression so that you don’t have to do context engineering in a new chat.

Artemonim · September 20, 2025, 10:17pm

By the way, GPT-5 became my main agent.

gpt-5-high >= gpt-5 > gemini 2.5 pro >> grok coder fast

And I still feel sorry for the money to compare it with Sonnet 4.

Artemonim · September 22, 2025, 1:50am

God bless Elon and xAI for that 15 minutes and 0.35 free American dollars of refactoring

Gargantubrain · September 25, 2025, 5:37pm

Have you had hands-on time with gpt-5-codex to be able to judge it yet? Thank you for this thread and all your time!

Artemonim · September 26, 2025, 12:06pm

Codex is weird. Benchmarks are tasty, but I don’t want to waste time to retrain from gpt-5-high.

Artemonim · September 29, 2025, 8:07pm

I tried using code-supernova-1-million as a cheap QA that would increase test coverage and fix problems in my TypeScript project along the way.

As a result, I had 207 tests, of which 187 were passed. I switched to gpt-5-high (I also wrote “gpt-5, I’m switching to you”) and asked to double-check changed tests, as well as finish the work; at the end of the prompt, I summarized the context via /summarize.

After completing his work, I asked prompt:

Can the quality of the tests written before switching to you be qualified as good?
Just answer the question.

Full answer of gpt-5-high

No.

Artemonim · October 8, 2025, 4:18pm

gpt-5-high >= gpt-5 > Grok Code Fast

gpt-5-high: Stable and smart
gpt-5: Sometimes I use it to save money, but I’m not sure if I’m really saving.
Grok Code Fast: Excellent for gathering information and simple tasks.
UPD3: gpt-5-mini slightly smarter than GFC, so you can use it for simple tasks or reasoning (ChatGPT Free Plan uses gpt-5-mini)

Gemini 2.5 Pro outdated relative to gpt-5 and Grok 4
Grok 4 probably still broken. I’ve already lost hope and I don’t want to check.
Claude 4.5 Sonnet Thinking is too expensive;
UPD2: The best model for creating and work with huge documentation.
gpt-5-codex is strange, so i’am lazy to research him
Grok 4 Fast Reasoning is too lazy model; It is better to take gpt-5 or Grok Code Fast

Also, I’ve downgraded to Cursor v1.6.46, because Cursor v1.7.x ■■■■■.

mehmetbozkurt · October 8, 2025, 5:48pm

I was very surprised by Grok Code. Besides being very responsive, it performs much better than many other models. I think AI language models vary depending on the language used, the size of the project, and how they’re used.

Topic		Replies	Views
How To Optimize Your Usage: The Best AI Models to Use, version 2 How To	34	4695	August 21, 2025
Gpt-5 high is Surprisingly good Discussions	12	659	October 9, 2025
What's your go-to model in Cursor? A frontend dev's take on Gemini 2.5 Pro vs. Claude 4 vs. GPT-5 Discussions	26	3876	September 30, 2025
GPT-5-Mini is a great value Discussions	15	1758	August 21, 2025
How you all choose which models to us? Discussions	25	3585	May 19, 2025

How To Optimize Your Usage: The Best AI Models to Use, version 2.2

Related topics