GPT-5-Mini is a great value

I’ve been using GPT-5-Mini instead of Auto and it has been nailing bugs again and again. I give it clear context and code snippets and directions and it solves issues while Auto keeps going off-track. I say this because this model is super cheap compared to how effective it is. I know it has limits, but with very focused requests it’s doing great.

Look at the comparison to Sonnet-4 in cost and token utilization.

I think Cursor may be doing some optimization with GPT-5, at least with mini, because it uses barely any tokens. So not only is it cheaper but it uses less tokens. I have given it several files with a few thousands like a code and it had no trouble. I assume if I had it go though my entire codebase it would struggle, but smaller context is where it seems most effective.

It should cost about 1/5 of gpt-5 requests, but it seems cheaper than 1/5th cost based on what I am getting out of it.
https://openai.com/api/pricing

For those that are worried about Auto not being unlimited. I think how well Cursor is working with these smaller models is making me less concerned. I may end up not using Auto much with models like gpt-5-mini (I know this may have been Cursor’s plan from the beginning- make Auto less effective so we don’t use the unlimited model anymore, but still)

it seems cheaper than 1/5th cost based on what I am getting out of it

it’s probably because it thinks less.

All the thinking counts as output tokens (well duh, it is the output as well). The model that thinks less (or don’t think at all) costs much less for agentic flows where it has a bunch of turns → a bunch of long ■■■ thinking sections where it repeats all the same mantra-like stuff over and over.

And sonnet, even while not thinking, just costs a f*ing lot for no reason.

So I totally agree, mini is soooo worth it in terms of its pricing and reduced thinking (which is even needed for simple tasks with direct instructions as it doesn’t tend to overthink every little detail and just gets the job done)

In fact, before mini my favorite model to work with for automating simple repetitive edits was 2.0 flash because of its speed, no thinking and cheap price while being somewhat accurate. And mini blows it out the water being almost 100% obedient to the instructions, albeit slower but at least insanely cheap.

Never understood all the hype around claude models for non vibe-coding i.e. real work. Stupidly expensive, not obedient at all even compared to f*ing 2.0 flash and with tendencies to overextend and overengineer on almost every task I gave it.

Are you running it on max mode?

I wouldn’t say Sonnet costs more for no reason. I’ve been using GPT, Sonnet, Deepseek, Kimi K2, and a few other models. My experience, having used GPT-5 a lot since it was introduced, is that Sonnet is still a better model.

Despite the improvements with GPT-5, over the last two days I discovered that it is making some BAD code edits. One example: dynamic TS imports, to import a type that was already imported at the top of a file, inside of a Promise<T> generic, as the return type of a function….

someMethod(…): Promise<(await import(‘./path’).SomeType> { … }

I mean, WAT!?!?! I found my code from the last couple of days RIDDLED with that kind of thing, as well as a whole slough of other small issues, that all added up to code that did not do what it was supposed to, and caused runtime errors. Other examples of just bad code…frequent casting of things as any. I even had a ton of `(this as any)?.blah?.blah?.()`when the service being used was already injected, was always a real instance, and worked perfectly fine: this.blah.blah() would have been perfectly fine! Countless cases of ROGUE unknown as something casts, because GPT-5 insisted that it was working around typescript assignment restrictions…which…WHY?!?!?!? Thats WHY WE USE TYPESCRIPT!! Don’t bypass the type checker, dimwit!!

Attempts to use GPT-5 to resolve these issues, resulted in each of them becoming even more complex and convoluted, weird, and causing even worse runtime issues.

Sonnet, though? Psh, no problem. Cleaned it all up right away, produced beautiful code, didn’t introduce any more weirdness, and I was able to move on with my day.

Interesting thing about cost… GPT-5 may be half the cost, per MTok wise, but between time lost waiting on the LONG thinking cycles (same problem with Gemini 2.5 as well), then time lost with non-functioning code, then further time lost fixing broken code, then more time lost dealing with WORSE issues due to bad fixes, then MORE time lost…either fixing it yourself (which I did do some of yesterday, just to wrap my head around what the heck was going on and why) or switching to another (more expensive, at least on a per MTok basis) to fix the issues, etc. etc.

The interesting thing about cost, is, its not so much about Sonnet’s higher MTok cost. I’m far more expensive than Sonnet. The WASTE OF MY TIME, wastes the time and money of the company I am working for. Further, when you burn time and tokens FIXING what a model of lesser quality BROKE, you ultimately burn your “it costs half as much” buffer, AND THEN SOME, and quite likely end up in the inverse situation: GPT-5 + screwups + screwup fixes + screwups of screwup fixes + manual fixing of all the screwups…your total token cost is more like 4x that of Sonnet, which usually (at least IME, speaking for myself) gets things right in one go.

And again, I am WAY more expensive, than the model or MTok cost. So when MY time is wasted by a model? Its significantly more expensive than even what I mentioned above. Don’t dismiss Sonnet just because its MTok cost is higher. What do YOU cost, and what do model screwups cost in terms of YOU and your billable time???

yeah I see your point, but also our use-cases of agents are different. I almost never let them implement things on their own, and when I do - it’s some repetitive boilerplate stuff. I don’t use any of the “best” models for it, I just point to a template / example and instruct it to do basically the same exact thing for all same places.

For example, when I was refactoring entity services to a new pattern recently, I just finished documentation I would write anyway (in my case it’s JSDocs for the factory class and README for the pattern itself), implemented it for one of like 50 entities and told the “smart” agent to examine all of this and generate a coherent set of instructions on how to do it generally for any other entity, skipping all complex stuff and making just interfaces instead. Then I fed this instruction to a new “dumb” agent to do the thing for particular entity. It did the job (poorly, but I didn’t expect much, I just needed completed boilerplates so I can finish migrating more advanced logic by myself).

I was experimenting with gpt-5 during promo week so “smart” model was gpt-5-high. For the “dumb” models I tried some different ones, and mini performed pretty well actually. And that’s even before they fixed edits for new gpt models. It almost didn’t struggle.

Well, the best performant in terms of speed vs accuracy was actually “auto” which probably used sonnet, and I think that not because it wrote good code but because it often tried to do all the work before me even when the instructions stated TO NOT touch complex logic and to focus on completing the overall structure and interfaces. Mini was kinda slow but at least it honored the instructions. And 2.0 flash did poorly as always often missing some of the listed methods but quick as hell.

So, for strict instructions following I’d actually never touch sonnet as it performs badly in my experience. But for just doing the thing ~ right without you holding its hand it is still the best, I guess…

Oh, and you’re speaking of plain gpt-5, right? Not the mini version? Because full gpt-5 is indeed kinda stupid with writing code itself even with instructions, for some reason (I guess it’s because it often thinks in a wrong directions)

I don’t think our approaches are very different. I spend a lot of time planning up front, before I let the agent loose with my plan. However, fixing issues, is often not a plannable thing, so that’s a bit different. Thing is, my planning was detailed, but, not down to the line of code level (if it was, I’d have just written the code myself.) I was honestly surprised when I really took a close look at all the code GPT-5 has written the last several days or so, and to see just how bad it was. The sheer inconsistency of it all, DESPITE having it implement according to detailed user stories in Linear (accessed via MCP.) I also have an extensive cursor rule set that governs code style, software design, architecture, and numerous other things.

I actually developed a specific paradigm that I call PRACT (short for PRACTical:

  • Plan
  • Research & Refine
  • Actualize & Act
  • Complete
  • Terminate

I’ve written about it in other posts. Anyway, I am not vibe coding. I do plenty of detailed planning before I set the agent loose using epics and stories to guide it.

Originally used Sonnet for most things, and it seemed to do fine. Its a bit slow, although now GPT-5 seems just as slow (seemed fast at first, maybe OpenAI dedicated more resources to it when they first released to rope everyone in? The ol’ bait-n-switch?)

You are correct, though…I’ve been talking about either gpt-5 or gpt-5-fast. I have not used -high, I have used -low for somethings. I think I actually used -mini on some planning stuff. I don’t think I have used it to code yet, but I’ll give it a try here and see how it goes. I honestly don’t think it could be WORSE than the normal model! Man, not after what I just saw and went through the last 12 hours! :open_mouth: I’m gonna switch to mini right now here, in fact, as I’m kind of fed up with gpt-5 normal mode.

What you describe about the model Auto used, touching code you did not want it to? That makes me think of Gemini. I have not found that Sonnet touches a lot of code you tell it not to. Very occasionally, I’ve found stray edits. The model that DOES do that a lot, though, is Gemini. I call it “The Bulldozer” because Gemini LOVES to touch swaths of unrelated code and completely rewrite them. I had a lot of problems with Gemini earlier on when I was trying out different models (and before I developed PRACT.) I don’t think Sonnet does that…bulldoze. IT menanders some times, but, I think now, EVERY model does to some degree.

you know what - you actually may be onto something, it could be gemini. I was convinced it was sonnet because of “You’re absolutely right!”-isms but after testing it rn with manually selecting sonnet it seems it doesn’t yap that much as auto did that day.

I should also notice that I’ve almost never used sonnet for like several months so I might be wrong if Cursor team actually fixed this obedience issue with it. Imma try it out first next time I’ll need agents to see if it really is better now.

But with their pricing I would still likely use gpt 5 mini and 2.0 flash for the most cases :slight_smile:

(also, 2.0 flash seems to be free likely due to a bug cuz it is deprecated but it still does work when added manually even without the google api key lol)

By far, the most disobedient model I’ve used is Gemini. The thing is a freakin bulldozer. It has ruined more code than…well, anything else combined. The thing doesn’t write very good code, and it is hyper-opinionated, it WILL do things its way, it will often totally commandeer entire existing, working, tested, settled code files, and totally rewrite them for its own purposes, having nothing to do with your product. It is the weirdest thing. I can’t even figure out why it behaves like that, but IMO its just a broken model, and it WILL screw up your code.

Sonnet is good. It is not perfect, but overall, it generally seems to be like a highly knowledgable junior coder with good programming skill. It makes mistakes, but it doesn’t bulldoze like Gemini. Sometimes it meanders, by which I mean it will occasionally change code that was outside of scope (although, with PRACT, that has lessened considerably and it generally stays within the guardrails set up by story+rules very well.)

GPT-5, one of the things I initially liked about it, is it actually has the ability to be more “surgical” than Sonnet. So if you need to be really targeted, then it can target specific code better. But…its longer reasoning time seems highly wasteful, and even though it can be surgical, it often doesn’t write good code. So surgical or not, it can sometimes produce just BAD code… I wasn’t aware of how often, until the last couple of days, where I did a deep dive on all the code GPT -5 had generated oh, probably since early last week…and there were just too many issues. I ended up manually fixing a bunch of it, then set Sonnet loose on it, which seemed to clean everything else up quite nicely.

All this said, I don’t know if the behavior of each model, is the same for every language or framework. I use web tech, with TypeScript as the primary language. It may be that Sonnet is better with the tech I use, and maybe its worse with, say, Java or something else.

No, using my $20/mo usage limits only.

gpt‑5‑mini is genuinely a solid model — and the fact that it’s even available for free (usable after hitting your paid limit) makes it an ideal substitute for Auto mode.

That said, it’s not quite reliable enough to run fully autonomously. And comparing it directly to Claude Sonnet 4 just isn’t fair or objective — they’re not playing in the same league.

The comparison is showing how much more you can get with it if you are not needing something as powerful as Sonnet 4. It’s a suggestion about choosing the right model for the job to get the most out of your monthly usage limits. I hear some people say they just always use the best models for every request and then are surprised how quickly they blow through their limits.

To stay within the subscription limits, it seems you just need to stick with AI autocompletion and avoid using the agent entirely — is that correct?

The thing is, the programming‑AI world is clearly moving toward full vibe‑coding for 90% of common tasks. And it was actually Cursor — through its pricing, quality, and overall convenience — that kickstarted this whole trend in the first place.

But now, Cursor feels like it’s slipping behind because of its pricing, almost as if it’s taking two steps backward. Honestly, it’s painful to watch.

FWIW, I was comparing to gpt-5 :brain: normal. In my case, I may just be a heavier agent user than most, I can’t say. In any case, I found gpt-5-mini to be inferior, but, I may just be pushing both models to their limits regarless? :person_shrugging:

The limits of which plan? I have been using the Ultra plan for a while now, and I have not run into the limits yet. I suspect I will before the month runs out, however, it has provided me significantly more usage than the Pro+ model I was on before, which itself wasn’t bad for moderate usage.

If you are on the Pro plan, it sounds like Pro is just no longer viable for agent usage, and is really only intended for unlimited tab completion.

As other’s have noted, at least for now, GPT-5-Mini does not appears to count towards the monthly usage according to the in application usage limit popup. So I will be using this as my new Auto for majority of my tasks unless I need some heavier lifting models.

It’s nice to know what model you are working with, unlike Auto.

I’m a fan of it. Don’t think it’s as good as '03, but it’s not far off. One thing I dislike is it will often perform Git actions like commits without my permission.