agree
I have continued trying with GPT-5-fast, because only half of the possible gains from new models comes from the model itself, and the rest comes from improving prompting to better suit the model.
That said, I have some feedback for the Cursor Team @danperks
I have found GPT-5 to be very clever, and write very good code, but only in a single file. I made a new feature with it and I just kept everything in a single, very long file (3,000 lines by the end). GPT-5 delivered excellent code and changes the farther along I got. I mentioned before that it feels like GPT-5 just isn’t integrated well into Cursor, and this confirms my suspicion here. I have found that GPT-5 does not:
- read files, snippets, or terminal text that I have tagged with @. When I paste it in manually the agent responds to it.
- It doesn’t “see” my current tab. It goes searching through the codebase for what I’m talking about to try to find the code.
- It is not aware of the environment or the terminal
This is all fixable I’m sure, after adapting to these limitations I’ve been able to get great results and its my preferred model now. Although it is a step backwards as far as Cursor functionality is concerned.
I worked a lot with classic GPT-5.
The linked files in context seems to really make no difference. It just likes to read files and grep stuff itself.
I am also getting lots of messages about Model provided invalid argument to apply_patch tool. and The model made no changes to the file.
Otherwise GPT-5 really seems to understand code and solve nasty bugs on first try.
I’ll have to try GPT-5-fast as people seems to be more happy with it’s behavior.
I’ve noticed this too. It works well for smaller scripts but when you’ve got a large workspace, it struggles. It’s like it doesn’t realize it’s in agent mode. It makes the same mistakes OVER and OVER and OVER again. Context from 2 prompts ago is forgotten immediately. I think ChatGPT 5 is better than Claude 4 in general at coding but in cursor, ChatGPT 5 doesn’t seem to realize it’s in cursor and gets confused.
Also
- it doesn’t create or have access to the to-do list
Since launch I’m working on GPT-5 exclusively (first day just to test it but…) in 4 different projects and for me it’s my default model now, for some reason it’s the most consistent yet, very concise and does everything correct, bug fixes, feature building and code review. people that I know say it’s ■■■■ but I think they are using GPT-5 with broad prompts? I’m doing very controlled prompting and it’s better then Claude and gemini 2.5..
The issue I find, is that it does NOT use Cursors to-do tool, if you ask it to create & track a to-do, it does it half-baked way using text files, often forgetting to update them as it works. The other issue is, even if you provide detailed specs with a todo list, it’ll only do things in small chunks, stop and announce what it will be doing next.. waiting for me to say “Continue”, doesn’t matter if I ask it to keep shipping until to-do list is complete. It always stops and waits for me. Very irritating.
Yes, one of the biggest issue is not using tools right. Never uses to-do tool as well.
Please note that todos are disabled for GPT-5 until we improve them for this model.
I keep restating this. in t3 chat GPT-5 output is way more like Claude. In cursor it seems like GPT-5 can’t get the tool calls right, wanders off scope, and the way it interacts in chat is drastically different than t3. These smells lead me to believe there is some Cursor-side issues going on.
That said, I couldn’t get it to move an in-line function to a useCallback in the same file without failing badly, so it is hard to believe GPT-5 itself is any good at real code. It continues to feel like an LLM that impersonates an engineer, sometimes convincingly, sometimes comically badly. It just wants us to love it so bad it is willing to live with lies….same old GPT story. Claude still feels like an eager junior-dev assistant that accelerates my work.
Good to have a confirmation on that. ![]()
I kept telling it to use todos and at one point it resorted to creating a markdown file and tracking a checklist there.
I second that! I’ve been working exclusively with GPT-5 since launch, and the performance has been amazing. It developed two very complex features, and the vast majority of the time without any errors, bugs, or oversights.
I am also doing very controlled prompting, and I discuss the feature design with it until we are perfectly aligned so that probably helps a lot. Overall, hands down the best performer on my books!
I have to say I have essentially the opposite experience. gpt-5 has been great at long running multi step tasks, has been more accurate than sonnet-4, – especially with large refactors, seems fast enough though I havent paid much attention to that really, appears to be much more coherent in it’s thinking steps than sonnet-4
I think the comparison to Claude Sonnet 4 is what makes it feel bad. Claude Sonnet 4 is incredible. But the last few day chatGPT 5 has been starting to do better. If you want to know a truly terrible one try Gemini CLi. I loved how the shell program looked but it ruined every script we started on! I’m sure it’s better if you give it documentation, context, playground but it couldn’t remember anything and kept ruining what we did. I’ve had some luck with chatGPT. just real buggy
I’ve been finding that it depends on the tasks, whether GTP-5 or Sonnet is better. Sonnet is wonderful! Has been my go-to model so long as I have unused usage with my plan.
That said, GPT-5 is good, and in some cases better, than Sonnet. The area where I’ve found GPT-5 to be much better, is implementing Next.js apps. Sonnet was amazing! I have repeatedly been amazed at how awesome the apps that Sonnet generates are! However, as my apps have grown, Sonnet’s ability to handle them, I think, has diminished, and I’ve started to notice unwanted changes unrelated to my prompts and such.
GPT-5 however, handles my existing Sonnet apps with ease, does a phenomenal job updating them, is highly targeted with its updates (it doesn’t meander or bulldoze), and it seems just as creative as Sonnet when it comes to interesting and awesome UI features.
Sonnet still seems to do better on certain kinds of code. A lot of backend stuff, Sonnet still seems to do better on, especially new broad strokes stuff. Again, though, GPT-5 is more surgical when the need arises to update existing code in a very specific manner.
I used to use Gemini to do a lot of my planning, but I find more and more that Gemini is a bulldozing scatterbrained nitwit on most tasks. GPT-5 is vastly superior to Gemini. I guess I mostly used Gemini to plan in the past, to not burn Sonnet usage. GPT-5 has twice (and more) the usage as Sonnet, so now its my go-to for planning tasks. GPT-5 also, being able to be more surgical, is great for updating stories that need tuning…Sonnet often makes unwanted changes to stories every time its run (I don’t know if it can actually just make targeted line-by-line or even char-by-char updates…Sonnet always seems to regenerate the entire file/story/etc. and therefor, is always changing things it should not?)
When it comes to things that need good creativity, Sonnet generally seems better. It just seems to have more of a creative mindset than GPT-5. This is not that GPT-5 cannot be creative, but Sonnet seems better.
GPT-5 however, seems to understand the fundamentals of architecture and software design, a bit better than Sonnet. Both understand them well, but GPT-5 is so far giving me better results there.
=====
On the front end, UI/UX stuff, I greenfield with Sonnet, then do the rest with GPT-5 long term. GPT-5 is just better on long term web/mobile app work than Sonnet.
On the back end, GPT-5 seems incrementally better, notably on things that require surgical attention and architecture/design. Sonnet seems to do well on larger scale refactors and entirely new sections of code.
Now that anthropic models have become a joke (both 4.1 Opus and Sonnet) GPT5 outperform them easily. If you provide it structured documents and structured investigative prompts, it out performs everything that anthropic currently has and it is not even close. Even gpt5-mini outperforms ALL anthropic models, you know how expensive Opus4.1 is?
If Anthropic keeps this up, they won’t exist anymore in a year or 2
Such hyperbole. There is an infinite number of use cases, and different models are better at different things. Honestly, NEITHER Sonnet or GPT-5 can do a ■■■■ thing without having to be coached continually. However together they can check each other and produce good results. GPT-5 is often a terrrrrible typescript/javascript programmer in my experience. It just follows instructions better. Sonnet is a often much better (and 20x faster!). Opus is the king, but the costs are mostly not worth it. If Claude got opus to competitive prices with Sonnet, and got it to follow instructions as well as GPT-5, it would be unstoppable. As it is, they’re all excellent in their own ways, and great to have them all working together in cursor.
Similar experience for me. In an existing large codebase gpt-5 constantly ignores rules, modifies the wrong piece of code, bad reasoning, duplicates imports, etc. Sonnet 4 was/is working way better for me
I’ve been using GPT-5 since its release and I’m absolutely satisfied; for my projects (Tauri app with Rust and game development with Unity) it’s the best model at the moment.
This topic did not age well. gpt-5-high with max mode simply beats the s* out of claude 4 sonnet.