Ridiculous excitement over new models when none of them work very well and sonnet has gone backwards too

The new cursor updates have taken the product backwards. Too much focus on adding models at the expense of the product and new versions aren’t using sonnet very well either. Why are we now getting so many short responses and questions like ‘Would you like me to focus on fixing this specific issue?’ - these seem designed to use up fast requests if you ask me. I’m losing patience with this product. It’s costing too much not just in money but the output is dreadful at times, theres no safety in place (it happily deletes critical business logic) and even sonnet has got worse. I think they should stop adding models and sort out the core product offer otherwise its just not worth it.

rant over.

1 Like

That’s not my experience at all, I’d say everything is working better than ever (only use claude-3.5-sonnet). Maybe the way you are prompting is the issue?

1 Like

there are many moving parts in processing your prompt, your local cursorrules, the used model, the size of your project, and the scope/bounds of your request to the model (@codebase vs ‘fix one specific file’). For me it is a never ending process of tweaking them, choosing different parts for different prompts. So yeh, it is not easy but now I don’t have these ‘Would you like me to focus on fixing this specific issue?’ questions.

1 Like

ChatGPT-o3 so far is almost hirable. Like all LLMs you have to bird-dog it on context and strategy, and it’s a lot slower than Claude, but it’s also doing vastly higher quality work.

The only issues with o3 are a low bias for action (it likes to over-confirm) and a tendency to say “okay done!” when it’s not… but that’s when you ask Claude if it’s finished yet then stop claude before it can put its dirty paws in the pastry.

1 Like

Agree to some extent. I’m not a fan of cursor rules. If I scan through a code base I can see the stack, patterns and packages. I think that ability seems to be going backwards with cursor and sonnet

“Almost hireable” made me chuckle. Very true. Agree with the flipping between models. I think the key is that some models have seen a problem more often than others so the training data is stronger but this is usually on easier problems. The issue is when you’re working on things that models won’t have been trained on. That’s where the enterprise cost benefit use case is less strong and comes back to the “almost hireable”. Thanks for your balanced response - more balanced than my middle of the night stressed out devs rant.

My prompting is ■■■■■■ perfect thank you :grinning:

Enterprise doesn’t use LLMs unless they’re Microsoft partners, lol. Or maybe I’m biased as ex-Dell and knowing how many light years behind my little startup agency they are in terms of tooling.

We’ve actually only added DeepSeek recently, specifically because it offers great performance for the cost! Besides that, we don’t often add new models except when they are exception to what we already offer.

The short responses and confirmation questions are pretty ‘out-the-box’ for Cursor, but our various “rules” features are meant exactly for this purpose - you can instruct Cursor to not ask follow up questions and to make more assumptions, or to plan out its actions more before executing them - whatever you want!

If you have specific and clear examples of where Claude is not working as well as you think it should, do send them over!