About model selection, my opinion

I encountered a hard-to-detect bug. Although the steps to fix this bug are simple, it requires identifying specific solutions, which involves reading a lot of code and conducting complex analysis.

After my testing, the following models proposed correct solutions:

  • gpt-5
  • grok-4
  • claude-4-sonnet-thinking

The following models failed to propose correct solutions:

  • gemini 2.5 pro
  • grok-code-fast -1
  • deepseek-3.1

grok-4 not only proposed concise and useful solutions, but also identified other potential subtle issues during the process and offered solutions for them.


Conclusion

  • grok 4 excels at solving difficult problems that don’t require outputting large amounts of code.
  • gpt 5 excels at solving complex tasks that require outputting large amounts of code, but it’s very slow.
  • grok-code-fast-1 is good at handling problems of moderate complexity with medium output volume, and it’s very fast.
  • claude-4-sonnet is roughly comparable to gpt5, but I feel that claude is more obedient, while gpt5 is smarter.
3 Likes

Hey, thanks for sharing your observations about the models. I also think grok-4 is good for complex tasks and roughly comparable to claude-4.1-opus, but much cheaper. As for grok-code-fast-1, since it’s still a new model, it can sometimes behave unpredictably, I hope it’ll be optimized and start providing higher-quality answers.

1 Like

It’s not always the same, worth to revolve models on hard bugs. Yesterday o3 saved my day, while gpt5-high, claude code (sonnet), gemini pro all failed.

Gpt5-fast-high nails complex bugs most often. Since the cursor guys keep us stupid on context window etc, one has to conclude that it works best in Codex. So I bought plus yesterday.

Thanks for the heads up on grok4, always ignored them as rather bland token wasters.

That stupid grok-code-fast thing can also be usefull for simple bugs like linter errors - it just races thru them.

Also Sonnet in Claude Code seems to write the best plans (Opus supposedly even better).

Yes, I think, grok4 is good for thinking hard problem, but it is not good for generate large lot of multi-file-code

:rofl:

1 Like