When an Upgrade Feels Like a Downgrade — My Claude & Cursor Experience (Will This Keep Happening?)

When Claude 3.5 first came out, it was incredibly efficient. It saved a lot of time, especially on many projects I worked on through the editor. However, as the release of 3.7 approached, I started noticing a decline in 3.5’s performance. It wasn’t just a general feeling; I realized that when giving the same tasks, the model, which previously performed them well, was now starting to make mistakes or produce incomplete results. Even simple tasks started to pose problems.

When 3.7 was released, most of these issues seemed to be resolved. Initially, it worked very well and was stable. But after a while, I started to notice a similar pattern. The way the model approached certain tasks changed, and the error rate increased. Sometimes it would develop assumptions on its own, treating things as if they existed when they didn’t, and producing responses based on these incorrect assumptions. This, inevitably, made the model feel as though it had been weakened.

In general, it’s clear that models are updated and changed over time. But seeing that these changes aren’t always improvements is really frustrating, especially for users who rely on these tools for their work. You rely on a model, build your workflow around it, and then, over time, it no longer delivers the same performance.

The biggest difference between 3.5 and 3.7, in my opinion, is context management. I used 3.5 extensively, especially on projects with multiple files, and it quickly became an essential part of my workflow. Its potential, particularly when paired with Cursor, allowed me to achieve more success than before. However, as the complexity of my projects increased and the number of files grew, serious problems began to arise. The model started to forget context, become confused, or generate irrelevant answers. In contrast, 3.7 handled these situations much more effectively — it struggled much less with complex tasks and context management, showing significant improvement compared to 3.5 in these areas.

However, over time, even this powerful model started to degrade. Honestly, I don’t understand why this happened. Was it for optimization, cost-saving, or something else? It’s unclear whether this issue was caused by Claude, Cursor, or perhaps both parties played a role in the changes that led to this. But no matter the reason, it feels like a disregard for the users.

You release a product, people get used to it, trust it, and build their workflows around it. Then, over time, you intentionally degrade the old product’s performance and release a new version, saying, “Here, now use this.” Deliberately weakening the old product and marketing the new one as a slightly better alternative is, in my opinion, unethical. This isn’t a sales strategy; it’s manipulation of users. Moreover, such approaches will lead to a loss of trust in the long run. Because eventually, people will start questioning: How can we be sure the next version won’t be intentionally weakened too?

I’m sure I’m not the only one experiencing this issue — I’m confident that many others are facing the same or similar problems as I am.

In the future, I may be able to explain the problem I’m experiencing in more detail and clearly by providing examples from the coding part, but I’m not sure if I have the patience to do so.
For now, I’ll stick to describing the issues I’ve faced only on a surface level.

11 Likes

I quite agree, I noticed a drop in performance with Claude 3.7. It used to do incredible things for me and now it’s really not so great.
I really hope the CURSOR developers aren’t nerfing things on purpose.
In any case, I’d really like a solution, an alternative or anything to get back to better performance because I can’t move forward with my project.

I’ve noticed that it quickly loses the thread of the discussion now and creates a lot of useless files etc…

4 Likes

Hey there - I made a tool to help with that last problem you mentioned (Which, tbh is most of the problem)

Check out GitHub - taggartbg/bivvy: A Zero-Dependency Stateful PRD Framework for AI-Driven Development (or https://bivvy.ai)

It lets you create a contextual PRD + stateful task list for each feature, then step through it systemically. No more losing the thread.

3 Likes

This is just capitalism at it’s core. Apple did this with it’s mobile products.
Purposely degrade the performance, so users upgrade → Tried and true strategy to generate more revenue.

1 Like

models and agent workflows are changing and they likely will be for the foreseeable future, this is what context and rules are for

I know it’s frustrating when models seem to get dumber or don’t work as well with agentic tasks, but when you find a workflow that works for you, it’s important to enshrine it as a rules file

don’t like a model making assumptions?
tell it not to

is the model making additional changes?
tell it to not make changes out of scope

does the model get off-task when working with a large codebase?
write an architecture document for it to reference

this technology is too new to get into habits and you can’t just trust that a dynamic technology like AI is going to always work exactly the way you need it to, so it’s up to you to decide how to use these tools

by their nature, LLMs work within the boundaries you provide, so create a scope that works for you

2 Likes

awesome repo!

this is the answer to pretty much every complaint I see about models not working as expected, an LLM will only do (or not do) what you tell it

writing a PRD is a fundamental step when you’re building software, and it’s no different for AI, you need to treat the LLM like it’s an engineer you’re bringing onto a new project

2 Likes

Need to take that with a grain of salt.
Often, lately anyway, you give it very specific tasks, and they run right through the rules.
This happened to me today on more than one occasion.

  1. Do not write any code until I have approved the plan and task list → Writes code anyway.
  2. I specifically asked you not to write any code → Oops sorry, i’ll listen next time.

I just finished a small task list where little buddy went off the rails and wasn’t listening to anything I was saying.
Created random files in random locations in the codebase, not adhering to the architecture at all. I mean, created a bunch of files in the root ffs.

When asked to delete them, it said sure! And didn’t do it.
After asking 4 times to specifically delete the files it just created, and which exact files to delete, it says sure! And randomly did other work from the task list, not deleting the files.
This was a fresh context window and small task list.

Roads? We don’t need roads where we’re going…
Ef the rules.

1 Like

Yeah, 100%

I have a branch here GitHub - taggartbg/bivvy at projects to try to strengthen the execution flow of task lists (and also adds an optional “project.md” file). But now it is generally really good at stopping when its supposed to. And it always respects {rest: “true”} flags.

Check out the last couple commits, I’m pretty happy with this ruleset.

(Also, I often switch to Ask mode when I want to make sure it doesn’t actually write code)

1 Like

valid points, I’ve run into the refusing to do a task while asserting it can. Gemini’s gaslighting is particularly bad lol

I would still always recommend using rules specific to your workflow and keeping the scope of AI tasks as narrow as possible but with something like Cursor that is using additional models to orchestrate the main model behind the scenes things can go off the rails even with rules

almost feels like they’re injecting prompts that could contradict some rules (i.e., “when user asks X always do Y”, “never say you can’t do something”)

I think there’s a way that the cursor teams can integrate something from memory in an official way. I think the software can move in that direction.
They’ve said they’re working on a fairly innovative feature (hence the 50 testers) I hope it’s a nice evolution that will help all that.

I may be talking nonsense, but

  • Automatically create discussion summaries
  • Create a persistent project task tracking table and launch task by task
    Etc .

Finally there are ways to create some things to improve this. Why not a feature that will generate our rules automatically after several questions about the project etc…

persistent project task table could get problematic as you’re iterating on ideas in a more broad fashion, these are things that should be user-determined rules files not hardcoded mechanisms

from the outside looking in, it feels like the Cursor team is overengineering things and it’s making the models behave poorly, they should only be providing a suitable framework for each model to act as an agent within the context of a codebase and leave project-specific features up to the user to add (or have Cursor add)

the more fine-tuning they’ve made the back end, the worse the UX has gotten

so far, 0.46 was peak

1 Like

We’d like the team to speak up and address the problems.
If it’s really a question of cost, I’d rather pay a higher subscription fee. But above all, I don’t want to be hampered by the agent mode and AIs.

1 Like

yes 3.7 suckss now. i don’t know why it became dum today.

1 Like

This is the biggest problem I think I’ve had with Cursor.
The lack of communication from the team.
I’ve built and lead dev teams in my day and this forum is essentially that, a dev team.
A team needs information, even just basic information, to feel confident and heard.
This forum is lacking that, and it doesn’t take much at all to achieve that.
What happens to bugs filed here? No idea.
Do they get auto processed into a kanban board? That would be a cool idea.
The amount of people on here complaining, have ideas, are frustrated is pretty substantial

I couldn’t agree with you more. The team needs to improve in this area.
Personally, I’m looking for an alternative right now. There’s a lot of talk about augment code and the new VSCode agent. I’ll take a look at it.
I don’t know how Windsurf agent behaves and whether it has been reduced.

I love cursor, but there’s something unclear behind it that we’re not being told, and this lack of communication and transparency bothers me.

1 Like

I signed up for VSCode Agent today, spent the afternoon testing it. Free 30 day trial, we’ll see.
I’d rather support startups like Cursor but the bugs, lack of communication, and secrecy is making me rethink it.
Filed a bug here today about the base foundational functionality of an IDE. Not a good bug.
I have no idea if they’re even going to look at let alone address it, which is something I can’t afford. I can’t be fighting with the IDE while simultaneously trying to wrangle a toddler into doing tasks without going off the rails lol.
As of this morning, or right now, my time using cursor is paused until they can show a little more professionalism in their toolset, communication, vision, and future roadmap.
The bug I found today is a deal breaker unfortunately, hope they get their sh*t together through these growing pains.

1 Like

Yes. You get used to how a model operates, what it can cannot do, and then that completely changes. By the time you realize it the project is damaged and you need to revert heavily. There needs to be some level of consistency.