Has anyone tried writing a lot of code through it?
What are the results, the sensations, the size of the hole in your pocket?
Has anyone tried writing a lot of code through it?
What are the results, the sensations, the size of the hole in your pocket?
I got some useful answers in a similar post about Opus, but not one here yet.
o3 failed a lot with my code, and I mean a lot.
o3 or o3-Pro?
Both.
I honestly don’t know what the rage for o3 is about for complex programming tasks it performs horribly.
i must agree, o3 is not usable based on my experience, i never had any luck with it… it takes long time before it starts making any changes and they are nearly all the time completely incorrect. also since the pricing has changed, each prompt of o3 costs me around 0.70$ so i dont even want to see the bill for o3-pro
o3 didnt work well when i tried it out :((
It’s pretty bad for me; I can say it’s completely useless. Also, the regular version is better than the pro one, which is strange.
Not sure how representative this is, but here’s a case:
https://github.com/Artemonim/AgentDocstrings/pull/14
git reset --soft
. I then tried to resolve the issues using Gemini 2.5 Pro, but it failed to fix them.o3 is good for planning and inspecting bugs and creating docs (good tool calls), but executing is meh. I use it to make todos and use Sonnet 4 to implement them
well even 2.5 flash is good at that and it is free…
My take: o3 is solid for really complex problems when other agents get stuck - it’s great at identifying root causes and giving detailed analysis. But man, it’s expensive as hell.
I’ve found a better workflow using Opus for debugging and Sonnet 4 for implementing, but honestly it depends on your experience and prompting skills. I’m still learning a lot of this stuff and not great at prompting yet, so o3 actually helps way more since it can work with my mediocre prompts better than smaller agents.
The one place o3 really shines is app architecture reviews - minimal prompting needed for solid feedback and reasoning. For building small/medium apps I rarely use o3 unless I wanna do something fancy or need to compare and contrast approaches.
When I do need architecture brainstorming/feedback, I prefer using it in ChatGPT over Cursor. In Cursor it drains my usage fast and the refresh wait is brutal, but in ChatGPT I can manage my usage better and know exactly how much I’m spending.
Haha okay confession time - this was totally a joke! I saw you ask the exact same question about Opus earlier so I decided to be cheeky and just swap my entire response. Literally did find-and-replace from ‘Opus’ to ‘o3’ and ‘Claude’ to ‘ChatGPT’ Couldn’t resist when I noticed the identical phrasing.
Realistically speaking, my experience with o3 is that it’s great at troubleshooting or finding errors that Sonnet 4 or GPT 4.1 spend forever on or just can’t solve. Of course understanding structure, tools, and prompting matters - again comes down to your skills and prompt engineering.
But I only use it for finding complex logic errors and fixing those specifically. Outside of that, I don’t find it more useful for regular tasks - I prefer Sonnet 4 or GPT 4.1 in those cases. In my experience with continuous tasks, o3 fails more often or can’t stick to what I’m asking, although the recent updates with To-dos made it better at staying on topic.
Even better workflow is using o3 to find all the errors, identify and analyze root problems, build a plan on how to fix it, then let regular agents implement it - basically not wasting too much credits. With proper prompts I’ve managed to use auto sometimes to complete fixes based on o3’s feedback on the issue.
Still, for continuous work or regular daily use I stick with Sonnet 4 or GPT 4.1, and only pull out o3 when I’m stuck or need feedback on logical errors that normal agents can’t handle.
Regarding o3-Pro, I haven’t found any use cases where I need it in my experience. Regular o3 was more than enough and o3-Pro pretty much gave similar results but way more costly. But I also don’t develop complex apps.
Apart from this being an LLM thread, O3 is good for planning, and better for revising plans from Gemini Pro / Sonnet 4 Thinking. Will give it a go to 1-pass planning.
Compared to what models? And for what programming tasks? I think O3 does best for planning and backend, whereas the Sonic models do way better for frontend.