o3-mini is much cheaper.
claude-3.7-thinking is more expensive.
o3-mini needs multiple prompts to reach the same commit (sometimes it finishes without even a summarized conclusion or reaction).
claude-3.7-thinking will use more tools, try to analyze more things, appear to go off-road several times, then come up with a complete solution that also includes affecting other features just because it felt doing it in the same prompt.
o3-mini lags at first, then comes up with a compact (barely sufficient) code solution, then it will polish the implementation from tool use and prompting feedback.
claude-3.7-thinking shows a plan in the first 3 seconds, then uses tools, searches things and starts changing things all over the place with other steps in-between, you can interrupt to complement, clarify or finalize an implementation or processing effort.
o3-mini writes 20 lines of code to fix the issue after at least 3-4 chained prompts and lacks creative art abilities in regards to code layout and it’s badly colorized results
claude-3.7-thinking writes 200 lines of code that fix the issue on 1 try with elegant formatting and fancy execution results (also fixes other issues sometimes in the same prompt).
In theory o3-mini should be cheaper to use, in practice the cost ratio between final results favors claude-3.7-thinking. Total human watch time looks similar for both.
Does anyone else feel the same?