tried one prompt with o3 as i had multiple tasks for the llm, o3 completed maybe 5% of what i told it and im being generous as it was a wrong solution for a rather simple php problem, ignored other instructions and tasks, consumed around 90K tokens, cost me 50 requests, tried to use python command that does not exist, let alone in a php repository, failed to use mcp server
none of these problems happen with other models and before 0.50 o3 was rather superior, i used it frequently for the exact same codebase