People, Your Honest Opinion

I want to ask the community if anyone is having the same problems with Cursor. When the update to Sonnet 3.7 came out, the code quality was simply incredibly good and precise. Then came all the problems with rate limits and high loads.

By now, the tool has become nothing but frustration for me. I don’t understand what the company changed, but I personally find it incredibly difficult to work with. 3.7 Sonnet doesn’t feel like 3.7 Sonnet. The agent ignores instructions in the prompts, makes changes on its own, the code quality is sometimes simply ridiculous, and it feels like it immediately loses context.

I don’t want to accuse anyone of anything, but I exceeded the 500 requests in the Pro plan and now use the tool on a usage-based basis. Since then, Cursor is simply no longer able to make even simple adjustments. Has anyone had similar experiences?

8 Likes

I don’t think it’s just you, but this is a biased sample because I came here out of frustration. But in my experience 3.7 did seem to come out of the gate strong and is struggling now. Or maybe it’s just that I am now asking it to do more now than before. No, it’s not just that. It’s obviously more sluggish and struggles with basic things.

4 Likes

first 2 days of 3.7 being added to cursor were magical - every prompt ended up with first time success like 90% of the time. I’d pay 3X just to have that again. You’re not alone.

5 Likes

Exactly the same here.

Yes. Same here.

Give gemini-2.0-flash-thinking-exp a try, I’m getting comparable performance and results to initial Claude 3.7, and o3-mini-high seems to be stable-ish. Each have their behavioral quirks, and longer and more complex context seems to make them more prominent. I’m not very fond of deepseek r1, but some seem to work with it just fine.

For context, my Claude issue is that it “lies”, almost as if it “likes to lie”, which I find quite interesting. I push it to do some more complex tasks, it’ll simulate the results, fast cheap initial swag no problem. It’s when questioned if it simulated or “did the work” that it repeatedly will make false statements about what it has done. I’ve even had it claim to have deleted the ui of an app erroneously, which it had not done, and insist that it had. To be fair, I’ve been able to get the same behavior out of every model I’ve attempted to do so, to varying degrees, but Claude is so far the most consistent.

2 Likes

Thank you for the tip, I will try Gemini!

Lately, I’ve been spending a lot on useless requests. I’ve tried setting rules, but they’re simply ignored. Most of the time, I’m wasting requests, wasting time, and sometimes it even destroys things randomly or makes unexpected changes.

It feels risky to even consider using it in production. I’ve wasted so many hours on this, and of course, a lot of money as well. It’s incredibly frustrating.

So, I’m canceling my subscription and exploring other options like…


. I need reliability in both the results and the way it operates. If I tell the agent “do not create a new interface,” I need to trust that it won’t go ahead and create 10 new interfaces in random files. If I specify “do not change the schema,” I expect it to respect that.

But claude? It just randomly ignores rules and does whatever it wants.

3 Likes

Can also agree that that cursor seemed like god send when claude 3.7 was first release and was excited that this quality was the new future of Cursor. It felt like a Cursor 1.0. I find current Claude 3.7 + 0.46 to be at most as good Claude 3.5 + Cursor 0.45 on my personal computer. On my work computer Cursor 0.46 is almost unusuable even after every hotfix so i’ve been downgrading to Cursor 0.45.

The thing is even though we can downgrade and use Cursor 3.7 in the downgraded version, it still seems not much better than Claude 3.5 +Cursor 0.45.

Which makes me curious, is there any way to emulate fresh realase Claude 3.7 and Cursor 0.45? If this quality costed too much to maintain than I am sure myself and other users would love to pay more to get back the same quality.

I think fresh release Claude 3.7 + 0.45 may have set a new standard for us who got a chance to use it which is why we are especially noticing the new issues now. I wonder if we did not experience that if we’d find 0.46 (when not behind a corporate firewall) a decent upgrade.

1 Like

Sonnet 3.7 works very well for me when it works at all, which is not often. I am experiencing “High load” errors more and more, to the point where it has become completely unusable.

1 Like

In general, I feel like Cursor AI has been getting worse and worse for the last 2 weeks. Up until then, I was very impressed with the results, but now I am misunderstood so often, clear instructions are no longer followed and numerous errors occur, such as files no longer being changed or editing simply stopping in the middle. Or changes are simply made without asking that have nothing to do with the actual task. Using Cursor AI is now just frustrating.

1 Like

I would also say - the cursor version starts at 0. right now. This stuff is bleeding edge and they are pushing updates sometimes twice a day.

The reality is this - it’s an arms race right now. Getting stuff out faster is more important than ensuring it works perfectly. You can’t have bleeding edge and bug free. At some point, I’m guessing they will get into a LTS pattern, but there is a ways to go.

1 Like

well I did try the other options out there just the last few hours… and I should say, each company has it’s perks… there are few things I like in the new, some I like in cursor, I guess the idea should be, at least at the moment, work with diferents tools depending on the task

I totally agree on the “lying” behaviour part. At first I though there might have been some mistake on my part with the way i was designing my prompts but after 2 days and all credits gone…I was like “FFF” it + Cursor kept crahsing on me losing the last interaction and going back to the past 4-5th chat we had before the crash.

instructions and loses context too quickly. It’s frustrating, especially when you rely on it for precise code generation.

Another trick is refreshing context frequently—sometimes wiping history or reloading the model helps maintain consistency. You could also try combining Cursor with another LLM-based tool for validation. Have you experimented with different prompting techniques to see if it improves accuracy?

I’ve tried out a few things in the last few days. I simply believe that once a project gains a bit of complexity, Cursor becomes pretty useless. Especially since the context of Cursor’s AI has been shortened.
It’s just not consistent in the quality of its output, and unfortunately, the output is often unusable. Add to that the numerous bugs, such as requests being terminated in the middle of processing, etc. At the moment, it doesn’t make sense for me to continue working with this tool.

No model is provably the same from one day to the next due to several issues, one of them being the actual day date, which is inside the prompt, so it changes the output. The model vectors are based on many things and inference will change if anything changes, even the daily date. Not mentioning temperature or prompt contents, context window.

But one of the most important changes is changing the inference cost by effectively reducing compute rounds at run time. Or increasing them. The company will of course cut costs as much as possible until the quality drops faster than the price.

Anthropology themed named company has recently declared that their censoring steps will actually blow up 3% of legitimate user prompts and will increase their own costs by 20%+ just so they can promote themselves as able to handicap their products to appease certain policies.

Not to mention that Cursor’s infrastructure or application prompt might change over time. It will never be the same, UNLESS you download a local model and initialize the same parameters every time and never update anything.

2 Likes

I would add to this that we do not know what Anthropic is doing day-to-day with their own system prompts, which models are running and when, or any of their alignment dung that they love forcing into the prompts.

The best we can hope for is that they listen when enough complain about it. But, Cursor isn’t solely to blame; this is more than likely an issue with Anthropic. I say this due to my experiences using every possible Anthropic communication option.

Their web-based chat performs the most reliably as far as responses go. Their API, depending on what time of day you’re using it, could be amazing, or it could act like it’s a squirrel that just figured out how to write bad code and has the world’s worst case of ADD.

I’ve had similar experiences using OpenRouter and other IDEs that connect directly to Anthropic, similar to Cursor. I believe that Anthropic uses a quantized version of its model at peak usage, which would decrease its intelligence.

I will say that the most recent version of Cursor has not only been the most reliable for me in terms of performance but also with the agent/model actions. That could mean that they changed any number of API settings, using different prompts, changing where prompts or directions might be injected, what gets shown or not shown to the model, etc.

As @vibe-qa explained, no model will ever be the same from day to day unless you’re running it locally and controlling all aspects of the interaction.

All we can do is adjust and adapt.

3 Likes

Hi

through my own experience, and large code base, i have found out that the instructions ie prompt must be very precise.
give the exact instruction about what needs to be done, ask it to confirm your idea and make a plan for implementation.
then when you have confirmed that both of you are on the same page, copy the plan which AI has made.
open a new chat and if the plan is long, start pasting one section of the plan at a time, or the whole plan if it is concentrated about changes in one file, or one specific thing.
this AI is not reading your mind. you need to be good at commanding it, and giving it precise tasks.
sometimes it gets it right away and sometime it doesn’t.
it is all about how well you communicate with the machine.

4 Likes

I just wrote a thread giving my honest opinion and the mods banned/deleted it from being viewed. Seems like they’re hiding any negative posts now. What a shame.

1 Like