I might understand why the openAI models, like GPT4o and O3-mini do not work that well with Agent mode. They don’t seem to be consistently reliable in following instructions over longer contexts. And my guess is that this is why they quickly bug out in agent mode interactions, especially as the conversation gets slightly longer.
I’ve found deepseek-v3 however to be a very reliable and consistent model when it comes to following strict instructions over a long conversation, without overthinking it. And because of this my guess is that it may be a good model for agent mode, perhaps better than Claude 3.5 sonnet. And even if not always better, it would be good to have a properly working alternative to Claude 3.5 in my opinion.
What do you think? And do you share my experience?