hey! Just wondering why people choose 3.5 sonnet over 3.7?
Nice nickname.
Claude 3.5 is faster at generating boilerplate code, while 3.7 is better at generating frontend code. It can generate an entire user-friendly dashboard, while 3.5 is simply faster with smaller tasks. So it depends what you’re working on, but if you want to work faster with smaller refactoring adjustments, 3.5 will do it faster than 3.7.
3.7 often misunderstands if you don’t spell out exactly what you want. Plus, it’s limited by the Cursor prompt, so I’m over using it unless I need some serious thinking. And when I do need to think, Gemini 2.5 is still my go-to for analysis, while 3.5 handles the coding.
Sonnet 3.7 and similar models hallucinate and make unrequested changes mainly because their training prioritizes producing fluent, plausible text over strict factual accuracy. The newer “hybrid reasoning” modes in these models can actually make hallucinations worse - they generate longer, more elaborate answers but still lack real fact-checking, so they sometimes invent details or justify incorrect information. These issues are especially noticeable with uncommon names, technical tasks, or when the model tries to “explain its reasoning.” While there are ongoing efforts to reduce this (like adding more verification steps), the architecture still struggles with reliably distinguishing truth from plausible-sounding fiction.
From experience the whole 3.7 range of models and similar mistrained models from other providers that have 30-60% hallucination rate on evaluations should be scrapped and marked as a mistake never to be repeated again.
Yet the AI providers (Anthropic, OpenAI and others) still run them and many times users do not realize that false assumptions when reasoning create hard to detect mistakes.