Context
I’m not entirely sure why everyone is so obsessed with Claude. Sure, it’s solid as an SWE, but it’s not the smartest model out there. And if you take a look at the benchmarks, the “Thinking” version isn’t that far ahead of the regular one. Certainly not twice as good.
From what I understand, based on reading its “Thinking” outputs, that mode acts more like a TODO list — it lets the model be a bit more efficient, that’s all. Also, Claude has actually been the most expensive model for over a month now. Even on the old plan, it used twice as many requests after Anthropic’s discount promo ended. Did no one else notice this? It’s written right there in the UI! (well… was before yesterday)
o3 is smarter than Claude, cheaper than 4-Tnk, and it makes good use of tools. But yeah, it’s still pricey.
Auto: I don’t know how it works for you, but for me it’s always GPT-4.1 now. Great for simple tasks — super fast and very capable as an Agent. It’s as free as Gemini 2.5 Flash, but faster and smarter.
o4-mini: Not as good as o3, but has similar Agent capabilities. It’s weaker when it comes to handling context (in my case, it once removed important functionality when it was only supposed to fix a bug or keep the logic intact).
Gemini 2.5 Pro: Better than everything else. It did have major issues with edit_tool
, but that seems mostly fixed now. Sometimes it still fumbles, but it’s rare. For the tasks I give it, it consistently outperforms o3. And it’s cheaper than o3 (calculated in AI Studio using my Subscription Usage Summary) and much cheaper than Claude-4-Thinking (Cursor team even acknowledged this in a public apology: 225 vs 550).
So which models should we use?
- If you’re on the Ultra plan, there’s really no reason to use anything but Gemini 2.5 Pro Max. Maybe try o3-Pro or Opus, but personally I don’t even touch it — I’m afraid it’ll burn a hole in my pocket. If Gemini can’t handle a task, switch to o3 or Claude to get a different take on the problem. But more likely, the issue is either poor prompt engineering, poor context engineering, or the task is simply too complex for current LLMs.
- If you’re on Pro or Pro+ and willing to pay a little extra to get the most out of Cursor, I’d recommend sticking with Auto until he starts to get stupid; then switch to o4-mini; and if both fails, go with Gemini 2.5 Pro, even though it’s about 3× more expensive. Also, I highly recommend using Gemini when making key architectural decisions or planning major refactors.
P.S. I don’t have a long professional track record to prove my word is trustworthy — but over the past couple of months, I’ve have a few projects in Cursor that you can judge for yourself. I’ve also have four more in private repos at the moment.
- GitHub - Artemonim/Artemonim-Little-Tools: A modular and extensible suite of command-line utilities designed for various media, text, and AI-driven processing tasks. The project is built with a plugin-based architecture, making it easy to add, remove, or develop new tools.
- GitHub - Artemonim/AgentDocstrings: A command-line tool to auto-generate and update file-level docstrings summarizing classes and functions. Useful for maintaining a high-level overview of your files, especially in projects with code generated or modified by AI assistants.