What's going on with Claude 4 Sonnet?

I don’t understand why this particular 4th model CONSTANTLY hallucinates, reports the existence of something in the code that is not there and offers such erroneous solutions that I am shocked. The previous Claude 3.7 Sonnet works perfectly. Fellow developers, have you encountered this?

5 Likes

Same here, Sonnet 4 cannot be trusted at all - some might argue that no AI can be trusted, but 4 is ridiculous, it is lying, cheating, skipping - at least from my pov. I switched back to 3.7 and it will take some convincing to bring me back to that mess.

3 Likes

I notice the same thing, but only throughout the day. During night, it is perfect, not even comparable, not sure how but during the day it is quite unusable for various reasons and every night, boom, magically it works like a charm.

1 Like

Its literally the load in a specific regional server center and how the connection from your internet to that area is routed.

In my area lots of people complain to have so many issues with certain models, where I’m prompting the same model and its super smooth (but I use VPN which improves my connection and likely re-routes the request to better regional hub).

For Sonnet 4 hallucinations, each model requires some adjustments on prompts and to get a feeling for how to instruct it. Best is to start small and build up in complexity.

It happens to me often with well working prompts on one model, when i switch to another its compleltey useless. So for some tasks I still use Claude 3.5 Sonnet because its just performing so well.

We might need a library of well working prompts and stuff that makes responses better or worse.

2 Likes

Something special happened from last night to this morning (NY time). Claude 4.0 started fixing issues without a blink…I am sure something happened in the backend after all the ranting in this forum.

Thank you!

No the ranting didnt particularly help but users who reported issues helped identify causes :slight_smile:

still ■■■■■ actually imagining stuff that doesnt exist and not following clear instructions dont recommend 3.7 is still much better

It’s been a long time, eventually I’ve pretty much given up on claude 4 sonnet, I only use gemini 2.5 pro and o3. I don’t understand how you can use claude 4 in real tasks that require accuracy, it’s only good for jokes. :frowning:

2 Likes

exactly the same problem, and it just hangs for hours, you give it a prod and it appoligises continues with a few more lines and does the same thing again. or its half way through doing something and says its busy and advises to use auto mode and a different host. or it just give you the VPN message . its beyound a joke now

Yes I have expressed this concern before with @condor Claude 4 as a “enhanced” LLM which supposedly utilizes more “LAYERS” of information and processing certainly seems lacking in the intelligent department. When a older model exceeds you by “3x”, clearly a issue is at hand especially when it’s everyone and not a single user.

As stated before, Claude 3.7 Thinking (non-max) works much better than Claude 4. I can’t even test with Claude 4 because the very first “code” it outputs is literal string of emojis and “OH MY!!! YOU WERE RIGHT!!!”

Its more CHATGPT than CLAUDE

Interesting, I found quite the opposite.
Previously I was using GPT 4.1 over 3.7 because 3.7 either hallucinated or went off on its own mission all the time.
With sonnet 4 I get very small amounts of this, and its generally only if I have been in the same window for ages.

Interesting, Normal Claude 4 works for you? No excessive emojis or 3rd grade level sentences?

Generally Claude 4 has worked really well for me. It sticks to what I want it to do and doesn’t hallucinate (much).
I do have to tell it to disagree with me or tell me I’m wrong if I’m wrong instead of trying to make me seem right all the time, but I need to do that with all the models.
I do use a memory bank with a bunch of rules and custom agent instructions, perhaps this has something to do with it?

“I do use a memory bank with a bunch of rules and custom agent instructions”

Yes this would contribute most likely to that. It really depends on how much contextual awareness it’s allowed and can utilize. It’s basically raising a child in your own way.

1 Like