GPT-5 thinking/reasoning cycles BURNING time?

Been using GPT-5 mostly since it was released. I use Sonnet (thinking and non-thinking) and other models for some things, but most of it has been GPT-5 lately. Mostly trying to get a good feel for it, and there are some things it seems to have an advantage at.

However, thus far, my overarching feel about GPT-5, is it is WASTING TIME!! Its thinking cycles, seem to average around 30 seconds. They generally range from about 18 seconds to over 60 seconds. In response to a single prompt in the agent chat, each one lasting maybe 8-10 minutes (lately…seems things have become rather slow lately, and I’m not sure its limited to GPT-5), there will be 3-5 minutes of time lost due to GPT-5’s excessively long "thinking” cycles.

This is no joke. This is a LOT of time. I literally just sat through a 1m4s single thinking cycle!! Looking at what it thought about, I honestly cannot fathom WHY it thought so long…it was relatively primitive “ok, so I’m going to be working on unit tests….which files…ah, these files, same files I’ve been working with all along, ok so what do I do, oh, right…I’m writing unit tests!” Reads some files. Then it thinks again, 42 seconds, same BS. Reads a few more files. Thinks again, 37 seconds. WHAT THE ACTUAL FRICKIN HECK?!?!

This is starting to feel like a LEGIT SCAM. This is pointless, useless, utterly wasteful, of my time, my money, and the sole purpose, its starting to seem, is to line the pockets of OpenAI execs? Because this thinking is mundane, trivial, and primitive. It is needless and useless, and seems to offer no fundamental benefit to the BASIC task of writing unit testing code (soemthing, it seems, just about any model can do, and do well, without any thinking capabilities at all…heck, generating massive amounts of unit test code was the first thing I started doing with GitHub CoPilot years ago back when it was just tab completions!)

This issue comes to a head here, after I spent a day troubleshooting a bunch of broken code GPT-5 produced yesterday morning and the day before. Code that builds, but had plenty of runtime issues. Turned out, it was really crappy code! I even spent about a half hour DEBATING the nature of TypeScript assignment restrictions with the stupid thing. I’ve been programming typescript for over a decade! I’ve lived and breathed it nearly every day for 3600 days! Not only was GPT-5 dead wrong, it was demonstrably wrong, when I literally would edit the code myself, fix it how I was instructing the model to fix it (to which it would NOT, especially after wasting 40-60 seconds “thinking” about it!), without any build or runtime errors. So not only is the model wasting my time with its useless thinking cycles, but its even DEBATING me on what it thinks are the fundamentally necessary reasons why it had to do what it was doing.

In the end, what it was doing, or trying to do, was bypass the very fundamental reason why anyone would choose to use TypeScript: for the type checks! But GPT? Ah, nah, who needs type checks when you can just ( as any).do.whatever.the.f.you.want()!?!?!?

There is something very wrong with GPT-5 thinking. More and more it seems to be pointless waste, and is otherwise not offering any kind of actual benefit to the process. In fact, after researching thinking models recently, especially coming across an article from Anthropic about how prolonged thinking cycles actually degrade the results, increase hallucination, etc. (now it makes sense why their cycles are around 1-5 seconds on average, and rarely top 11 seconds in my case!), I am starting to think that GPT-5’s thinking, is actually to my detriment.

I am EAGERLY awaiting non-thinking versions of GPT-5. Especially if they come at say a 0.5x cost rate vs GPT-5 :brain:, and especially if WITHOUT thinking, they actually write good code!

In the mean time, I think its back to Claude. My experiment with GPT-5 has shown its not as solid a model as it seems, or was hyped to be, not at least, once you get into the weeds of what it is actually doing. Its doing some pretty bad stuff in those weeds!

AI has been nerfed. Your best bet is getting over it. Going forward it’s pay-for-play, and it’s only going to get more expensive unless there are scientific advancements made in the field that will allow for better efficiency. Yes, I know, it ■■■■■.

1 Like

Ah!! GPT-5 strikes again! It just decided to nuke changes I manually added to resolve a runtime issue….wait for it…..to make ITS LIFE EASIER when badly implementing unit tests cases, because it didn’t want to create required mocks!!

Screenshot 2025-08-19 at 1.04.52 PM

I mean, seriously, WTAF?!? It is one thing to FIX production code, because a well-written test fails. It is an entirely different thing to BREAK production code, because your being lazy about your badly written test cases and don’t want to add additional, NECESSARY, mocks! (What the heck did they train this thing on??)

WHY is this model thinking for 30, 40, 60 seconds or longer every time it “thinks”? It does not seem to be providing any benefit…

Anyway. I would really love to try out a GPT-5 non-thinking model. Sonnet non-thinking is amazing, does the vast majority of the work I require with ease, is faster than the thinking model, and costs half as much as the thinking model. I really wonder, if it might be the same with GPT-5…is the excessive thinking causing these problems with it? (It could well be that the model was just poorly trained…however, the only way to tell would be to try a non-thinking version of it.)

I don’t know what’s happening with the industry. I read an Anthropic article about how prolonged thinking actually degrades results. So I really wonder, if the “nerf” is that, these companies are trading effectiveness, accuracy, usefulness, etc. for just massively expanding the output token counts so they can rake in the dough.

Ridiculous strategy, IMO, though, as people aren’t stupid, and a ploy like that is obvious to those of us watching (and dependent upon these models.)

Its curious to me, that every GPT-5 model that Cursor has started out with here, is a thinking model. I find that to be VERY CURIOUS! Why aren’t there non-thinking options? OpenAI seems to offer GPT-5 with and without the reasoning option. So there was no reason every GPT-5 model in Cursor had to be :brain: :brain: . Starting to make me suspicious…but, nevertheless, I think its time for some non-thinking versions. Shouldn’t be any harder to implement than integrating the thinking ones.

Wondering if doing so, adding non-thinking versions of GPT-5, would reveal just how bad GPT-5’s "reasoning” abilities actually are, and how much they are degrading results? :thinking:

I’m not sure either. So I’m actually a Cursor Ultra and Claude Code Max member. I signed up for Claude Code Max after Cursor released their new pricing scheme. After using CC, I noticed that is actually a huge disparity between the amount of tokens CC uses versus the amount of tokens Cursor uses while using Claude models. I tried to post about it here but my post was automatically flagged and the hit me with the “You should just email billing” load. I’m not sure what’s going on but I used 1.2 billion tokens in Cursor this billing cycle. I’m not sure how many I’ve used with CC, but whenever I query it the token usage never surpasses a few thousand tokens at a time. It doesn’t make any sense but I’m sure there’s a reason.

Yeah, I think its all the “reasoning” tokens. I shared this previously, but for convenience:

I think the whole entire concept of “reasoning” models is primarily to pump up the token counts. Reasoning models generate a lot more output tokens than normal models do. You primarily pay for output tokens. The more reasoning a model does, the more its going to jack up the output token volume. I believe most people, use either the GPT-5 models (all :brain: right now) or the claude-4-sonnet :brain: model. Interestingly, the default/baseline model is claude-4-sonnet (non-:brain:), but even though I’ve shared the above and numerous other articles and videos that show that reasoning models aren’t necessary for most Cursor agent/code-gen usage, they still use sonnet :brain:!

I think people are wasting a lot of money on thinking models when they simply do not have to. Sadly, we simply have no option with GPT-5 in Cursor right now, and we really, REALLY SHOULD (@condor ^). GPT-5 seems to do an EXCESSIVE amount of “thinking” that I now find to be utterly useless, and in fact maybe degrading. I generally prefer sonnet (non-:brain:), but I would like to see how GPT-5 (non-:brain:) compares.

In any case, I am a Cursor Ultra user, and I have poked around with Claude Code. I also noticed that Claude Code token usage was dramatically lower. That said, even with non-thinking sonnet usage, token usage was higher. I actually think there are some reasons for that. FIRST, Cursor is a full blown agentic IDE, whereas Claude Code is NOT! There ARE differences in token usage there…there have to be. So to be fair to Cursor, its actually doing something different than CC, and that to a degree necessitates more token usage. Some things that will ramp up token volume with Cursor:

  • System Prompt (which I think is non-trivial with Cursor)
  • Explicitly attached code file context
  • Documentation context attachments
  • Rule attachments (I know CC has some of this, but it seems its generally a lot simpler…I have some extensive cursor rules!! Maybe they could be simplified, but, still, lot of tokens involved here!)
  • Chat tab context attachments (optional, but an option with Cursor)
  • Open code tab context attachments (also optional, but an option)
  • Memories (you might not know about them, cursor has them, they tend to be auto-generated)

Something else one of the Cursor reps mentioned in a post recently, was that, they have to send the ENTIRE chat conversation and all its context, to the model, for every TOOL invocation… That caught me off guard. I figured it was every prompt, but no, its every tool invocation. They also explained that a tool is basically any of the agentic things that Cursor supports in an agent chat:

  • Reading file
  • Grepping files
  • Executing terminal command
  • Editing code file
  • etc.

Since this is per-tool call, rather than per-prompt, and for any given prompt I could have, I figure, a dozen to dozens, of tool calls? Then I suddenly started realizing, how it can all add up.

NOW, the cursor rep also mentioned they use caching, which is why you see the cached tokens in your usage log in your cursor account billing page. Those cached tokens, which actually constitute the bulk, are actually reducing total token usage/cost. I actually am not sure that we are so much actually using “a billion” (plus!) tokens each billing cycle (or even each week, as a coworker of mine does!) If you look at how much of the billion, is actually cached tokens, then the actual token usage drops dramatically.

I do wonder then, if you compare total token usage excluding cache, to CC’s token usage, if they become more similar… The purpose of the cache is to avoid re-sending tokens already sent from the Cursor agent, since the agent is progressive (you add to existing total context (including the full chat history) , every time you call the LLM API, so caching can actually work really well). Claude Code, may just not be fully transparent here…

I do think that reasoning models, with all their reasoning tokens, are wasteful! I think that GPT-5, and Gemini, and Grok (which may be the worst) use an excessive amount of reasoning tokens, without any real good cause, and likely to the detriment of the final outcomes (if Anthropic’s study is to be believed.) In any case, I’m rather soured on long-thinking models. I think they are burning my money for no good reason.

@Stephan Well, I am not sure how you are doing things, but, gpt-5-mini has been rather pitiful so far. I’ve been using it for hours now, and its even worse than gpt-5. I think I have to wrap up my experiment with GPT-5 as its just proving to be inferior, at least, for the much of the kind of code I design and write. It is not the worst model, its better than Gemini and some others, but its been making some strange decisions, and it keeps doing things I don’t like (such as (T as any) to get around the necessary and legitimate TypeScript type checking rules!)

Anyway. Its likely I’ll be using it again, as its half the cost of Sonnet, and I suspect I’ll run into my limits with Sonnet sooner or later, and GPT-5 is still better than the other top alts. I also do like that when I need small, surgical edits, gpt-5 (normal) seems to do well there, better than sonnet. Its just these odd decisions that its been making in my code…I am having a real tough time getting it to stop (even with rules), and some of them are just bad.

Just thought I would share this. This was a TRIVIAL task. Took over three minutes of “thinking” to get it done:

One single thinking session was 1m18s!! I gave the agent all the relevant context, code files, etc. Baffles me why it had to think for over a minute to figure out what to do…

Please add non-thinking GPT-5 models. (Should have used sonnet for this, not entirely sure why gpt-5 was used.)