Which better? Please share your experience
I can’t stand o4-mini’s lack of context… its like 50 tool calls and no information about what it is doing.
Claude 3.7 trivially changed a unit test in TDD to pass test case in a very early chat with my testing and I just lost trust. It’s tool calling was better than 2.5 pro exp 3/25 and had more context for the human overseer.
Gemini 2.5 pro is my favorite but has issues with tool calling (edit_file issues in another post as well). It does a great job of keeping me in the loop and hasn’t taken shortcuts that were trust breaking although it does apply simplest fix ideologies with explanations. I’d rather wrestle with tools than bad or trivial code. Can’t wait for the new 2.5 pro exp 5/6/25 model to be in Cursor!
Claude has a better tool calling system, it doesn’t fail as often as Gemini. Gemini is great, but it struggles with tool calls. Check the image below, this is a new chat, though. And it keeps failing. Currently, I use Claude for UI/UX and Gemini for everything else.
Based on what I have read on the forums, the current model gemini-2.5-pro-exp-03-25 is actually the latest one under the hood.
Thanks for the model context! Sounds like Gemini still needs to get better at tool calls and not generating the termination token when a response is unfinished, unfortunately.