ChatGPT 5.1 Codex High vs Gemini 3 Pro vs Claude Sonnet 4.5 for coding

neverinfamous · November 20, 2025, 5:31am

What’s everyone’s experience so far with these three models? I have seen reports ChatGPT 5.1 Codex High is testing highest for coding and seen people report good results with Gemini 3 Pro. But Sonnet is pretty reliable for most use cases so I would like more information before trying 5.1 and 3 Pro for complex tasks. What’s the cost comparison, also?

neverinfamous · November 21, 2025, 6:37am

Article on subject:

Igor_Markin · November 21, 2025, 8:20am

How I work with these models:

Task: Clarify the problem, context, and requirements; choose an approach
Model: GPT 5.1 High Fast
Why: Acts like a senior architect: defines constraints, risks, and the overall solution idea so you don’t waste effort going in the wrong direction.
Task: High-level planning and architectural/technical design
Model: GPT 5.1 High Fast
Why: Best suited for thinking through architecture, module contracts, implementation options, and the pros/cons of each approach.
Task: Detailed implementation plan broken down by files and steps (task breakdown)
Model: Composer 1
Why: Based on the given architecture, turns it into a clear step-by-step plan without excessive “creativity”.
Task: Finding the right files, functions, code fragments, and explaining the current implementation
Model: Composer 1
Why: Quickly navigates the project and gives understandable explanations without heavy analytical overhead.
Task: Draft code and local changes (1–2 files, without changing public interfaces)
Model: Composer 1
Why: Like a solid junior dev: writes prototypes and local changes reliably when the scope is clearly defined and it’s forbidden to touch anything extra.
Task: Fast bulk edits following a simple pattern (renames, small mechanical changes)
Model: Composer 1
Why: Good for mechanical work within a limited file scope when you explicitly define the boundaries of changes.
Task: Reviewing draft code from Composer 1 and making targeted improvements
Model: Codex 5.1 High
Why: Acts as a senior reviewer: finds non-obvious issues, improves readability, and checks consistency with the architectural plan.
Task: Refactoring that touches multiple modules and public interfaces
Model: Codex 5.1 High
Why: Sees the big picture, can propose a refactoring plan, preserve/improve architecture, and minimize side effects.
Task: Designing and writing tests (unit/integration), analyzing coverage
Model: Codex 5.1 High
Why: Better at designing test cases, edge-case scenarios, and updating existing tests to match new changes.
Task: Diff analysis and producing a “what changed and why” report
Model: Codex 5.1 High
Why: Can concisely and structurally describe changes, which reduces the risk of hidden side effects and makes review/code review easier for you.
Task: Final high-level check of architecture and risks (after major changes)
Model: GPT 5.1 High Fast
Why: Looks at the system from above: compares the final implementation with the original goals, architectural decisions, and long-term impact on the project.

endrits079 · November 21, 2025, 10:29am

Yesterday I wanted to try on same task all 3 models
GPT 5.1 Didn’t get the task right
GPT 5.1 Codex High Did the task but with a lot of changes
Sonnet 4.5 Got the task right with much better result and much less changes than Codex
Gemini 3 Pro Got the task right, slightly better result than Sonnet 4.5 and with even less changes than Sonnet 4.5

I consumes unnecessarily the quota unfortunately but I will experiment more and see in multiple scenarios who is doing better.

Scenario:

I had a page built with react, the layout is split in two columns on left side there was a table displaying a list of items on the end there was pagination, on right side there were some cards to display information

This was the task:
Update the table such that it always fits the viewport and the pagination will always be displayed at the bottom, when changing pages with different items number the table size shouldn’t not change to prevent layout jumps. If the table has more items than it can fit in viewport then the table should be scrollable

neverinfamous · November 21, 2025, 11:17am

There seems to be a consensus that Gemino 3 Pro is significantly better at designing complex front ends than Claude Sonnet 4.5 but also more expensive. Other than that, they seem fairly equal, is my impression. I haven’t seen a detailed comparison with ChatGPT 5.1 models yet.

Topic		Replies	Views
Gpt-5 high is Surprisingly good Discussions	13	934	January 7, 2026
Which AI model in Cursor AI is best for coding tasks Discussions	2	8242	December 2, 2025
Developers’ perspective: a comparative analysis of the applications of Claude‑3.7‑Thinking, Gemini‑2.5‑Pro, and the o3/o4‑High series models Discussions	2	673	April 19, 2025
Gemini 3 Flash better than Claude Sonnet 4.5? Discussions	8	1136	January 12, 2026
Gemini 2.5 vs Sonnet 3.7 vs Grok 3 vs GPT-4.1 vs GPT-o3 Discussions	12	10924	April 20, 2025

ChatGPT 5.1 Codex High vs Gemini 3 Pro vs Claude Sonnet 4.5 for coding

Related topics