About model selection, my opinion

tangjun · September 3, 2025, 3:09am

I encountered a hard-to-detect bug. Although the steps to fix this bug are simple, it requires identifying specific solutions, which involves reading a lot of code and conducting complex analysis.

After my testing, the following models proposed correct solutions:

gpt-5
grok-4
claude-4-sonnet-thinking

The following models failed to propose correct solutions:

gemini 2.5 pro
grok-code-fast -1
deepseek-3.1

grok-4 not only proposed concise and useful solutions, but also identified other potential subtle issues during the process and offered solutions for them.

Conclusion

grok 4 excels at solving difficult problems that don’t require outputting large amounts of code.
gpt 5 excels at solving complex tasks that require outputting large amounts of code, but it’s very slow.
grok-code-fast-1 is good at handling problems of moderate complexity with medium output volume, and it’s very fast.
claude-4-sonnet is roughly comparable to gpt5, but I feel that claude is more obedient, while gpt5 is smarter.

deanrie · September 3, 2025, 6:32am

Hey, thanks for sharing your observations about the models. I also think grok-4 is good for complex tasks and roughly comparable to claude-4.1-opus, but much cheaper. As for grok-code-fast-1, since it’s still a new model, it can sometimes behave unpredictably, I hope it’ll be optimized and start providing higher-quality answers.

Artemonim · September 3, 2025, 6:42am

leoing · September 3, 2025, 7:08am

It’s not always the same, worth to revolve models on hard bugs. Yesterday o3 saved my day, while gpt5-high, claude code (sonnet), gemini pro all failed.

Gpt5-fast-high nails complex bugs most often. Since the cursor guys keep us stupid on context window etc, one has to conclude that it works best in Codex. So I bought plus yesterday.

Thanks for the heads up on grok4, always ignored them as rather bland token wasters.

That stupid grok-code-fast thing can also be usefull for simple bugs like linter errors - it just races thru them.

Also Sonnet in Claude Code seems to write the best plans (Opus supposedly even better).

tangjun · September 3, 2025, 9:00am

Yes, I think, grok4 is good for thinking hard problem, but it is not good for generate large lot of multi-file-code

system · December 2, 2025, 9:00am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How To Optimize Your Usage: The Best AI Models to Use, version 2 Guides	34	6023	August 21, 2025
How To Optimize Your Usage: The Best AI Models to Use, version 2.2 Guides	28	3794	December 9, 2025
Gemini 2.5 vs Sonnet 3.7 vs Grok 3 vs GPT-4.1 vs GPT-o3 Discussions	12	10924	April 20, 2025
Which AI model in Cursor AI is best for coding tasks Discussions	2	8247	December 2, 2025
Which AI Model Do You Recommend? Discussions	2	210	December 20, 2025

About model selection, my opinion

Conclusion

Related topics