GPT 5 is really bad (at least in Cursor)

I don’t know if its Cursor or GPT-5, but my experience with GPT-5 so far has been very underwhelming. It makes weird choices, doesn’t remember or follow instructions well, and the code it writes hasn’t been great.

The worst part is that its so darn slow! It is absolutely painful to use! Sure I can ask GPT-5 do refactor something but it will faster if I just do it myself. I’m back to Claude already. It must work well if any of the benchmarks are to believed, but I haven’t seen it. Not in Cursor anyway.

A relatively straightforward code change in a single file it added duplicate calls to the backend “just in case” according to the comment it also added explaining it :frowning:

39 Likes

Hey, thanks for the report. We’ll work on improving this model’s performance soon.

4 Likes

After switching from Claude 3.5/4.0… my 2-cents after trying all day so far to get GPT5 to write production-quality python: Just don’t.

It’s next to useless compared to Claude.

Randomly ignores rules, overlooks obvious bugs even when you’ve highlighted them and provided an ALL CAPS explanation of the problem, makes absurd decisions (instead of fixing a regex, it added 100 lines of custom handling code into a function, literally named “generic_downloader”, to handle an edge case which the downloader was already built to understand (if the input regex as correct)), keep hardcoding the year 2024 in my auto-updater code…

10+ glaringly obvious screwups in an hour, which I’m certain Claude wouldn’t trip over basically… the quality-hit you take from this model just isn’t worth it.

I revered all the work it did in the last 2 hours (exactly 27 prompts worth according to my usage), then asked “claude-3.5-sonnet” to read the chat and fix the problem.

It got everything correct firs try, in just that first 1 prompt.

Anthropic must be so happy today!

4 Likes

Maybe the Cursor prompts need to be adjusted to suite it better, its still new. It insists on using absolute paths for every file, and it doesn’t seem to know or remember that it has a dedicated terminal. Its just not integrating well.

2 Likes

Yes, for sure, the 1st thing I noticed was it struggling with (and wasting loads of tokens on) a contradiction it had to try and resolve. My rules tell it to output a heading, but its system rules tell it never to use markdown “#” headings, with a few exceptions, and some other system rule tells it to use bullets (contradicting both the exceptions and my own rules), and the Cursor team have removed all priority inclusions (it used to be, that if we gave it our own rules, they take precedence), but now - no such thing - so it randomly panics about all this stuff with no clear way to resolve anything… then gets it wrong.

2 Likes

Ever since angrily posting this message I’ve been trying GPT-5-Fast, and while “fast” is only relative, I feel that this model performs much better. GPT-5 (regular) couldn’t complete a simple task, “fast” does just fine, and with minimal tool calls which I like.

It seems that there may be more going on then just speed.

1 Like

GPT-5 is a waste of time, not as good as Claude and Gemini

2 Likes

So far it seems similar to gemini, although i have to instruct it to use tools, it doesn’t seem to know by default it can use terminal commands, like to compile my code..

Claude goes a little crazy with the commands and gpt5 barely uses any , but once i tell it to do it , it does, so that’s ok if it’s free and i don’t have to waste endless requests telling it what to do. So far it’s not bad, although the context nuking ■■■■■ (claude handles context nukes pretty well , probably the best so far).. but I need to do a lot of hand holding with claude.. at some point i can’t deal with constantly seeing ‘you’re absolutely right!’ (which I am, but still, it should know better!!)

Test it out, the more we test and provide feedback the better it will be, that’s why it’s free

I agree. Thought I’d give it a go on a coding task that ‘Auto’ (Claude?) and Gemini have both failed at spectacularly. GPT5 was even worse. Didn’t appear to listen to direction, went round in circles, the only thing it seems to do better was not introducing so many compile errors as the other models - but overall deeply disappointing compare to the hype in the press.

2 Likes

What’s a “context nuke”? I’ve found that if I’m on one task, I keep the chat open, and I can go hundreds of messages in, and it almost always stays bang-on-track. If I do something dumb, like have a tool feed an entire monster DOM into the context - I just go back to the prompt above that point, and continue.

I also have success using the “duplicate chat” option and doing 2 similar things at once from that fork onwards.

This is my favourite claude-4 moment so far this week - that’s it patting itself on the back for stuff it worked out on it’s own :slight_smile:

pic_2025-08-06_16.28.07_2016

I agree. Three major issues:

  • I had to tell it about my rules files (or at least remind it with an @-link)
  • I had to tell it to use tools (e.g. tell it to lint after code changes)
  • More of a UX thing, but I like that Claude explains what it is doing (and answers questions) before changing code. GPT05 is totally silent and sometimes will answer my questions but always after it is done with everything

GPT-5 seems best at some problem solving, and writes code better than anything but Claude (Gemini cost me more time than saved, k2 seemed better), but Claude still seems to be the best assistant and create code that follows rules better.

Is this just a matter of working on the integration with Cursor? The model really does seem a promising and less expensive replacement for Claude-4, but it needs to match the rules, tools and communication better first.

[Edit:] I might be jumping to a conclusion here, but it seems the time to write the first code is an order of magnitude slower than Claude but I swear is spits out all of the code nearly instantly once it gets there. Maybe that is better token-wise, or maybe it is just UX waiting to show updates…anyone else?

2 Likes

just did 0.5 hours of simple coding with GPT-5 fast (non-MAX).

It get’s some work done. It surprisingly was even better than Claude Code at some tasks but then again, it is not very good at understanding direct orders or implied orders (look at this example, and follow it) and more over:
It is ■■■■ expensive.

0.5 hours of simple coding clocked in at $17 so far.

I would give it a solid 5 / 7 but I certainly would not pay for it.

Also, “fast” is not really “fast”. But I guess it will become “fast” once it is not free anymore.

It is like the typical GPT series… cherry-picking what it wants to remember from its context. Claude and Gemini far better when it comes to memory.

1 Like

I’ve been using it for a few hours continuously on MAX (which doesn’t add much for this model anyway…and I’m not sure I’m using the extra space yet). Based on the usage report, it is significantly cheaper than Claude-4-thinking. Seems to use less tokens too. Based only on total tokens it is a little under 1/2 price so far, but it seems to do a lot more with less tokens. It feels like it is coming out more like 1/8th the price. Tomorrow I am going to try a feature big enough that I’ll work with GPT-5 to make a design doc, then tasks doc, then let it run. So far I only trust Claude to get me to about 80-90% complete after one of these runs. It will be an interesting test.

1 Like

hmm strange… are you using it in Fast?

looking forward to your comparison tomorrow.

Just to point out, both OpenAI and us at Cursor are seeing a lot of GPT-5 usage, which may mean requests are not at their “usual” speeds. While I cannot guarantee they will improve in any given timeframe, I would expect improvements within the coming days.

4 Likes

Works perfectly for me, solved things for me that I could never solve.

6 Likes

I have used this for 5 minutes, and it cant even get right a single fix, its struggling turning off duplicate browser opening for my daughters little program. Auto mode did a better job, even though that was pretty useless. We have gone from programming really well a few weeks ago to near useless again.

This is so weird, I am getting really, really poor work, reverting back to claude4, it’s the only version that’s half decent, everything else is shockingly bad, unless it cursor that’s useless. CURSOR? But its the only programming app, apart from the new software coming, I will try soon, once out of the really slow and tedious beta stage TRAE, early signs are good, and i can get access to claude 4 again for just $3 this month and and $10 next month, lets see how it does first cursor might still come out on top, i will also test with gpt5 then I will know if the platform is the issue and not the model, but so far I’m impressed with the in depth analysis, although both don’t check my dev logs or read me first, should be that the applications should read all the code before making any changes. but they both don’t. FLAW…. oh, O3 isn’t that bad either, but so darn slow. If you could speed up this mode I think you MAY have something. Anything else don’t waste your time with. Update, chat GPT 5 fails instantly. Cant do anything i get an error, it appears it not cursor Afterall. What versions are you guys using to get chat GPT 5 to work effectively??

1 Like

What programming language?

Works very fast for me, but I don’t know what speed they planned? Works no less fast than O3 worked for me

1 Like