Claude Sonnet 3.5 Agent is sooo much better than o3 mini high

nikola-mitrovic · February 17, 2025, 10:48am

o3 mini high loses so much context all the time, even though it should have better context because of the thinking time. Also the code and problem solving is much worse.

o3 mini high is good for creating long step by step plans but claude sonnet is soo much better at implementing the plan and solving coding problems.

saketsarin · February 17, 2025, 11:09am

there’s no model that’s better than Claude sonnet in coding yet. o3-mini is good but still not good enough

maxmini1 · February 17, 2025, 11:28am

Sonnet is since the launch on July 2024 untouched in coding and AGENTIC perfomence.
While other thinking models like R1, o3 mini and so on maybe are better for complex debugging they all struggle with Agentic perfomence like applying code and the biggest problem is that these models dont have any sense for Frontend design. I dont know how Anthropic achieved it but Sonnet creates stunning websites with the right prompt. My gut feeling also say they are impoving this model silently but i cant prove it. Also today i made new websites and i felt again that it has improved over the past days. All in all there is not a single competitor for Anthropic in daily automatic coding. And this makes me kinda sad because the market needs competition to push it to the limits.

Jackjojs · February 17, 2025, 12:58pm

I find Deepseek-r1 to be the best at planning personally

MaxTeabag · February 17, 2025, 5:28pm

+1, sonnet3.5 is miles ahead on the agentic side fetching context etc.

stix121 · February 17, 2025, 6:57pm

100% agree. Would absolutely love to know what they have done differently to all the rest!

Nafnlaus · February 17, 2025, 7:07pm

In my experience, o3 is better at “analyzing the big picture”. But it’s API is buggy and frequently fails to apply changes, and the model will be lazy and even outright gaslight you at times. Claude is the better coder, albeit with caveats that sometimes it goes off and does its own unrelated thing, doesn’t seem to see quite as much of the big picture, and has a tendency to “wallpaper over problems”.

Each have their own role, though in general, Claude is better. But if there’s a difficult problem, I recommend doing one round of o3 agentic at the top of the thread, tasked to analyze codebase about the problem, before handing off to Claude.

gschlomk · February 17, 2025, 9:05pm

My observations exactly. o3 is very good at creating abstract outlines of things to be done, but it fails miserably at actually doing them. I will try your approach. Using o3 to create a road map, using sonnet for a step-by-step implementation.

idham · February 18, 2025, 7:46am

I use o3-mini high because it’s still good at one-third the price of 3.5 Sonnet, even though there are often issues with applying changes or generating code. I use Sonnet 3.5 for solving more complex problems.

dotowl · February 18, 2025, 11:02am

I feel if cursor team manage to build a Cascade Flow by windsurfer , we gonna be so god like. Regardless of models

diwayou · February 19, 2025, 1:17am

Programming requires different models for different scenarios, making it difficult for one model to be suitable for all situations. For example, Claude is suitable for quickly creating overall project files and framework structures; o3-mini and r1 are suitable for writing complex logical reasoning algorithms; o1-mini is suitable for solving problems with logical complexity between Claude and o3-mini, but o1-mini has much stronger text capabilities than o3-mini; deepseek-v3 can also solve many ordinary programming problems, but it is a bit slow; gpt4o-mini is only suitable for solving very simple problems, but it is very fast; in large projects, local generation using Tab and Ctrl+K is more stable; currently, we are still exploring how to make good use of Composer in large projects. For a given problem, we can try to solve it using different models to see which model can handle it.

This is a gameplay problem regarding an item in a game:

The Heart Shield can be synthesized from two mirrors of the same star level to create a higher star level Heart Shield, with a maximum synthesis of a nine-star Heart Shield. The newly synthesized Heart Shield will inherit the attributes of the two participating mirrors.

The rules are as follows:

1 star (poison resistance) + 1 star (poison attack) = 2 stars (poison resistance, poison attack)

1 star (poison resistance) + 1 star (poison resistance) = 2 stars (poison resistance, poison resistance)

2 stars (poison resistance, poison attack) + 2 stars (poison resistance, poison resistance) = 3 stars (poison resistance, poison attack, poison resistance)

1 star (poison resistance) + 1 star (poison resistance) = 2 stars (poison resistance + random attribute)

Each star level of the mirror can contain a different number of attributes:

1-star Heart Shield has 1 attribute

2-star Heart Shield has 2 attributes

3-star Heart Shield has 3 attributes

4-star Heart Shield has 4 attributes

5-star Heart Shield has 5 attributes

6-star Heart Shield has 5 attributes

7-star Heart Shield has 6 attributes

8-star Heart Shield has 6 attributes

9-star Heart Shield has 7 attributes

From the above rules, it can be seen that the attributes of the Heart Shield synthesized are generated based on the attributes of the materials used. However, if the target mirror can contain more attributes than the union of the attributes of the two material mirrors, the fewer attributes will appear randomly. Therefore, to stably obtain the desired attributes in the resulting mirror, this situation should be avoided, and only mirrors of the same star level can be merged.

Now I have enough 1-star material mirrors with different attributes. Please write Python code that takes the target attributes and target star level as input and generates the synthesis steps.

This tool is generated using Cursor through multiple steps and different models, utilizing PyQt5.
The entire sentence is translated from Chinese to English using GPT-4o-mini.

，我英语不好,下面是中文：
编程需要不同场景使用不同模型，很难一个模型适用所有场景，例如Claude适合做整体项目文件，框架结构快速创建；o3-mini和r1适合复杂逻辑推理算法编写；o1-mini适合解决逻辑复杂度介于Claude和o3-mini之间的问题，但是o1-mini的文字能力比o3-mini强很多；deepseek-v3也能解决很多普通编程问题，但是速度却有点慢；gpt4o-mini只适合解决非常简单的问题，但是非常快；而在大项目中还是Tab和Ctrl+K的局部生成更稳定；目前还在探索大项目中如何利用好Composer，给一个问题，可以尝试用不同的模型解决，看哪个模型能够搞定。这是一个游戏中道具玩法问题：
护心镜可以通过两个同星级的镜子合成一个高一星级的护心镜，最高可合成九星护心镜，合成获得的新护心镜会继承参与合成两个护心镜的属性
规则如下:
1星(减毒抗)+1星(毒攻击)=2星(减毒抗，毒攻击)
1星(减毒抗)+1星(毒抗)=2星(减毒抗，毒抗)
2星(减毒抗，毒攻击)+2星(减毒抗，毒抗)=3星(减毒抗，毒攻击，毒抗)
1星(减毒抗)+1星(减毒抗)=2星(减毒抗+随机属性)
其中每个星级的镜子可以包含不同条数的属性：
1星护心镜附带1条属性
2星护心镜附带2条属性
3星护心镜附带3条属性
4星护心镜附带4条属性
5星护心镜附带5条属性
6星护心镜附带5条属性
7星护心镜附带6条属性
8星护心镜附带6条属性
9星护心镜附带7条属性
从以上规则可以发现，护心镜合成的属性是根据放入的材料属性生成的,但是如果出现目标镜子可包含的属性大于两个材料镜子包含属性并集的数量，其中少的属性就会随机出现，所以为了想要稳定合出想要的属性条镜子，就不能出现这种情况，而且只能同星级的镜子进行合并
现在我有足够不同属性的1星材料镜子
编写python代码，输入目标属性和目标星级，生成合成步骤
把以上文字翻译成英文

jazzmonger · February 20, 2025, 4:55pm

I noticed there are several different Sonnet models - whats the difference between them? should they all be enabled?

oxlikesmath · February 20, 2025, 7:01pm

I’m suspecting that Sonnet is iteratively updated, but the updates are not announced.

shanithakur · February 21, 2025, 1:48pm

100%

Topic		Replies	Views
Sonnet 3.5 vs o3 mini Discussions	16	3244	February 22, 2025
Why is claude 3.5 sonnet superior for composer? Discussions	3	876	February 2, 2025
Deciding which model to use (Claude vs O3-mini) Discussions	18	4977	February 16, 2025
Sonnet still better than o3-mini? Discussions	2	892	February 7, 2025
Sonnet 3.5 + R1 is still the king Discussions	1	370	February 4, 2025

Claude Sonnet 3.5 Agent is sooo much better than o3 mini high

Related topics