Where does the bug appear (feature/product)?
Cursor IDE
Describe the Bug
Ok, I just had a rather rogue problem with GPT-5, that I think burnt up a whole bunch of tokens. This really doesn’t make me happy, quite the opposite actually. I had finished a bunch of work about 15 minutes ago, ran my test suite, and there were about 30 failing test cases (out of 1500.)
As is my normal course of action, I grabbed the error messages from the terminal, pasted them into a new agent chat, gave it a prompt with the project reference as context, since test errors were strewn throughout the project (some low level changes were made to data services and a few logic services), and let it rip. I then stepped away for about 15 minutes.
I just came back, and GPT-5 had not fixed ONE SINGLE TEST!! NOT ONE! The entire chat was a cycle of 23s-1m12s thinking cycles, separated by a few reads of files. This went on, and on, and on, and on, and on…and NOT ONE IOTA of actual unit test fixes were done.
GPT-5’s thinking is WILDLY EXCESSIVE, totally wasteful, and seems to get it into trouble on relatively mundane issues like “In @project/ these tests are failing. Fix the tests, but be careful not to change any of the @Recent Changes as they are intentional.”
I reverted back up to my prompt, switched to Claude4 Sonnet, and the thing thought for a grand total of 21 seconds, and fixed all the broken tests in about 2-3 minutes…
I…WAT!!!
I have run into this issue with GPT-5 more and more as time goes by. I am not sure if the model itself is being tweaked, or if there is something in the Cursor agent and how it uses the model that is causing this (I’ve upgraded several times since first starting to use GPT-5), but its really quite ridiculous. To have so much time spent thinking…I mean, I’d have to say half or more of that ~15m period of time, was the model “thinking”, with little spurts of reading files in between? So, 8, 10, 12 minutes of thinking?? To do NOTHING?
Whenever I switch back to claude-4-sonnet (thinking and non-thinking), the experience is so different. This model, and its integration with the Cursor agent, is…I guess I would have to say very refined. It just works. It doesn’t fuss or muss about anything. It isn’t super picky about how you craft your prompt…even if your prompt isn’t highly accurate, the model seems to get what needs to be done and does the job well.
I honestly don’t know if that is just a model thing? Perhaps this is purely a GPT-5 problem and there is nothing that the Cursor team can do to fix it. In that case, I guess I just have to deal. I am trying to figure out GPT-5 just in case anything happens with Anthropic here, but boy, it has not been an easy journey. GPT-5 is no Sonnet/Opus replacement, IMO.
If this is a matter of refining the Cursor agent and how it uses GPT-5, though, then I feel some efforts do need to be made here. Because near 15 minutes of the agent and model chugging on a problem without one single edited character, burning tokens the entire time, is a very serious problem. That is not just waste at a personal level, using up my allotment of tokens from my plan on needless useless “thought” by the model, but it wastes global resources as well, which is a growing problem with AI usage in general, but particularly with AI use for writing code. Numerous articles have covered the sheer amount of waste derived from vibe coding efforts and general agentic IDE usage overall.
Anyway, this was a particularly egregious case, and in part I guess, because I had to step away. Usually when I see GPT-5 going down a wasteful path I’m here to stop it, maybe they would all end up like this if I did not. In any case, I feel there is something very wrong here, it seems fundamental to the nature of the current agent->GPT-5 integration, and I am hoping that its a matter of refining this integration. Because that’s the word that comes to mind, every time I switch back to Sonnet: Highly, optimally refined.
Steps to Reproduce
Not exactly sure how to replicate. It seems to be a somewhat arbitrary problem, but it occurs enough, and seems to be a deep enough problem, that I’m writing this.
Expected Behavior
For the agent to actually solve the problem at hand, without spending exorbitant amounts of time “thinking” (perhaps just introducing GPT-5 non-thinking options would solve this problem right off the bat, as it seems to be fundamentally related to thinking cycles with GPT-5.)
Operating System
MacOS
Current Cursor Version (Menu → About Cursor → Copy)
Version: 1.4.5 (Universal)
VSCode Version: 1.99.3
Commit: af58d92614edb1f72bdd756615d131bf8dfa5290
Date: 2025-08-13T02:08:56.371Z
Electron: 34.5.8
Chromium: 132.0.6834.210
Node.js: 20.19.1
V8: 13.2.152.41-electron.0
OS: Darwin arm64 24.5.0
Additional Information
I apologize. I was rather ticked off when I canceled the GPT-5 session, and strait up reverted to the prompt, chose sonnet, and re-ran the prompt (which immediately started fixing the issues and was done just a few minutes later.) I should have captured a screenshot of the issue, and I’m kicking myself now for not, since I…well, DO NOT want to try and replicate it again, given how many tokens I think were just burned uselessly the first time around.
Does this stop you from using Cursor
No - Cursor works, but with this issue