What am I doing wrong with claude?

Hello! Everywhere I find info that claude-4 is best at coding. Here is my story.

For my app I asked 3 models the same - to analyze my code, and give me a plan to implement new feature.

Tested models:
o3, gemini-2.5-pro and claude-4-sonnet (all thinking variants)

My prompt was the same, all chats new. But I did not mention in prompt one requirement which is crucial for the right architecture - I forgot.

Out of 3 models only gemini gave me good architectural plan because it pointed out for the possible requirement which was not mentioned.

o3 and claude gave me plan of the wrong architecture - they did not foresee potential problem like gemini did. I had for add additional requirement for them to redraw plan.

Than I gave all three plans to every models and asked to compare in terms of robustness, logic, good practises etc.

As a result all 3 models gave me practically same response:
o3 - best, most robust and clever plan
gemini - almost same, but with some minor flaws
claude-4 - the plan it the worst (and interestingly, claude itself gave the most negative review of its own plan).

I understand that it is only one case. But how come the “best” coding model according to maaaany opinions out there makes worst coding decisions?

Secon - when I presented same â– â– â– â– â–  plan to claude - but told that it was his plan and asked to analyze it - claude answered that is a very good plan and very smart!

What am I doing wrong?

You’re using claude to structure your plan, i personally don’t do that!

the best ones are chagpt o1/o3/deepseek to structure.

but to EXECUTE THE PLAN!

oh my friend… then claude goes stupdly better imo haha

2 Likes

Hey, thank you for reply. I did not like how claude-3.7 did coding - almost always ruined code which had nothing to do with what was in my task. Claude-4 was not yet tested in this regard. o3 on the other hand almost never ruined my code. Gemini 50/50

1 Like

Hi @exoder and welcome to Cursor Forum :slight_smile:

I also did not like how Claude 3.7 Sonnet coded, it was making mistakes or coded too much for features I never asked.

Claude 4 Sonnet works well with coding for me.

Its definitely good to try different models to see how they perform.

1 Like

Hi @davedev also to you welcome to Cursor Forum and thanks for contributing :slight_smile: its always good to see what works for others