Hi, could you share a bit more details? It does not seem like its bad.
Claude 4 Sonnet has regular and thinking option. You may need to check your prompts with any new model as they have different training and behaviors. So just using the same approach like with previous models may not work as well.
I found 4 to be definitely better than 3.5 and I was mostly using 3.5 because 3.7 was just to eager to make changes you didnt need.
Your claims do not match with other observations and industry best benchmarks.
Interesting that you guys have had positive experiences! Makes me keep my hopes high, because its personality isn’t that bad and I feel like it almost works, but it just falls through on “sense of reality” for me.
I’ve had some level of succes with it, but too often it does something along the lines of running a term command as such: “echo ‘building’” and then saying “perfect! Its building! Try and test it” - as if it’s simulating logs from an actual process but it’s just the logs it has created itself, and then interpreting those logs as output from the program. Like intentional delusion almost haha
It has also multiple times written “test scripts” that is litterally just 40 lines of console logs saying “testing x…”
I have tested it on a swift/electron project using AVFoundation for video stuff
I have been switching between 3.5 for frontend, 2.5 pro for refactors, large edits and codebase information gathering and 3.7 for small scoped complex issues
Claude 4 is a new model so there are likely edge cases that Anthropic and Cursor may have to adjust. Now that its publicly its good to report bad respoonses with the thumb down in chat so they get info if things go wrong or not.
Otherwise feel free to share here some of your approaches and several of us forum members will try to help you if we see anything that could be improved from our experience.
I have actually completely missed the thumbs down feature, even though I have been using Cursor for more than a year, so will make sure to use it
Glad to hear it works for you guys, I will try and test it out some more then with different approaches, as it may be my form of interaction that screws it up
Yea, Claude 4.0 refactored a very complex project in like 15 and fixed major bugs that other models were choking on. Well worth the $. For those complaining about cost, go somewhere else or don’t use AI. Instead of driving 30 miles, you can walk! That’s your option.
On my end, Sonnet 4 is horrible… Like horrible horrible. It doesn’t respect architecture, suggests deletion as if completely clueless on the codebase. Gets stuck running python commands. Switched to gemini 2.5 pro and it solved it instantly. I was really open to working with it (especially at 0.75x cost). I wonder if it’s overloaded or something (?)
For me it’s not that I dont want to spend money or complain of the cost being too high for quality LLM’s, simply that if I spend premium prices I expect a premium model! If I can get 3.7 or gemini 2.5 for equivalent prices, why would I spend more for a seemingly worse model - but It turns out that experience is not universal
3.7 uses Cursor tools better than Gemini 2.5 Pro.
4.0 is weird. It has a problem with formatting C# code. And also it can call the edit tool for one (Kotlin) file many times, instead of making all edits at once (like Gemini).
But we must give credit to Anthropic for offering a discount during the initial period after the release.