Sonnet 4 is really bad?

BertramWittrock · May 23, 2025, 9:50am

Not a Cursor issue of course, but it lowkey just seems ■■■? Both on benchmarks and my initial impression.

It created a “verification script” that was just 20 lines of console logs and said “perfect! Now we have tested”

I know its not a reasoning model, but their pricing is insane for the quality you get, and its definently not “the best coding model in the world”

3.5 is better lol

T1000 · May 23, 2025, 10:05am

Hi, could you share a bit more details? It does not seem like its bad.

Claude 4 Sonnet has regular and thinking option. You may need to check your prompts with any new model as they have different training and behaviors. So just using the same approach like with previous models may not work as well.

I found 4 to be definitely better than 3.5 and I was mostly using 3.5 because 3.7 was just to eager to make changes you didnt need.

Your claims do not match with other observations and industry best benchmarks.

ziv · May 23, 2025, 10:10am

My experience with Claude 4, so far, is very good

BertramWittrock · May 23, 2025, 10:24am

Interesting that you guys have had positive experiences! Makes me keep my hopes high, because its personality isn’t that bad and I feel like it almost works, but it just falls through on “sense of reality” for me.

I’ve had some level of succes with it, but too often it does something along the lines of running a term command as such: “echo ‘building’” and then saying “perfect! Its building! Try and test it” - as if it’s simulating logs from an actual process but it’s just the logs it has created itself, and then interpreting those logs as output from the program. Like intentional delusion almost haha

It has also multiple times written “test scripts” that is litterally just 40 lines of console logs saying “testing x…”

I have tested it on a swift/electron project using AVFoundation for video stuff

(For benchmarks I was referring to https://artificialanalysis.ai - but it may not be the thinking model there)

I used the thinking model in cursor of course.

I have been switching between 3.5 for frontend, 2.5 pro for refactors, large edits and codebase information gathering and 3.7 for small scoped complex issues

T1000 · May 23, 2025, 10:36am

Claude 4 is a new model so there are likely edge cases that Anthropic and Cursor may have to adjust. Now that its publicly its good to report bad respoonses with the thumb down in chat so they get info if things go wrong or not.

Otherwise feel free to share here some of your approaches and several of us forum members will try to help you if we see anything that could be improved from our experience.

cocode · May 23, 2025, 10:36am

For me feels like Happy Hours

BertramWittrock · May 23, 2025, 10:57am

I have actually completely missed the thumbs down feature, even though I have been using Cursor for more than a year, so will make sure to use it

Glad to hear it works for you guys, I will try and test it out some more then with different approaches, as it may be my form of interaction that screws it up

Thanks for responses

Marlon · May 23, 2025, 1:05pm

Yea, Claude 4.0 refactored a very complex project in like 15 and fixed major bugs that other models were choking on. Well worth the $. For those complaining about cost, go somewhere else or don’t use AI. Instead of driving 30 miles, you can walk! That’s your option.

oneovernever · May 24, 2025, 1:47am

On my end, Sonnet 4 is horrible… Like horrible horrible. It doesn’t respect architecture, suggests deletion as if completely clueless on the codebase. Gets stuck running python commands. Switched to gemini 2.5 pro and it solved it instantly. I was really open to working with it (especially at 0.75x cost). I wonder if it’s overloaded or something (?)

anilaknbtech · May 24, 2025, 6:41am

Sonnet 4 thinking works really well for JavaScript. Huge improvement compared to Sonnet 3.7 OR Gemini 2.5 Pro.

BertramWittrock · May 24, 2025, 12:29pm

For me it’s not that I dont want to spend money or complain of the cost being too high for quality LLM’s, simply that if I spend premium prices I expect a premium model! If I can get 3.7 or gemini 2.5 for equivalent prices, why would I spend more for a seemingly worse model - but It turns out that experience is not universal

Artemonim · May 24, 2025, 12:38pm

3.7 uses Cursor tools better than Gemini 2.5 Pro.
4.0 is weird. It has a problem with formatting C# code. And also it can call the edit tool for one (Kotlin) file many times, instead of making all edits at once (like Gemini).

But we must give credit to Anthropic for offering a discount during the initial period after the release.

rsboarder · May 24, 2025, 1:03pm

Yes, it’s horrible for my nextjs projects. o4-mini is way more better with the same prompts and roles.

arthur-zhuk · May 24, 2025, 1:31pm

Same exp here. Once you start working on larger code bases you see how bad sonnet 4 is. Gemini has been much better.

Topic		Replies	Views
People, Your Honest Opinion Discussion	23	2433	March 18, 2025
New Claude 3.5 already worse? Bug Reports	5	1348	October 28, 2024
Claude 4 Sonnet & Opus now in Cursor Featured Discussions featured , feat-extensions	49	10527	May 24, 2025
Cursor/Claude Sonnet feedback Bug Reports	19	1061	April 19, 2025
What's Wrong with Cursor Lately Feedback	31	6218	March 13, 2025

Sonnet 4 is really bad?

Related topics