Claude 3.7 overzealous and overstepping its bounds?

Just a general rant and wondering if others are experiencing this.

Been trying Claude 3.7 since launch. Over 3.5 it seems … meh, in terms of accuracy and intelligence improvement. Honestly I’ve not noticed it being all that much smarter.

But, it feels way more “agentic” which I guess is the point … but sheesh, it’s actually becoming a problem because it’s very difficult to direct and unpredictable in how far it will go.

Example 1 (basic intelligence)

Had a project that didn’t have tailwind installed, but had DOM nodes with tailwind classes in it (that obviously weren’t doing anything). UI looked perfect already. So, asked it to remove the unused tailwind styling and clean it up, and explained that the UI looked perfect as is so just remove the classes only. It removed the classes, THEN proceeded to implement direct CSS replacements to all of the tailwind styling equivalents (and thus breaking the UI). I had to explain this one with a “follow me very carefully now and think through this logically…”

Example 2 (overstepping)

I implemented an MCP server to connect to my tasks database. I asked it to check my tasks for today. It did that, and then IMMEDIATELY assumed it had enough knowledge and context to go ahead and do them and started executing without any additional context or planning. I YELLED at it. (I updated my rules for it too … lesson learned).

Example 3 (overstepping)

Had vite running and had test files. Claude executed requested changes, then before me having a chance to review them decided to proceed to test the implementation, running builds and creating duplicate test files. OK kinda nice of it to try to be thorough in one shot, but just unexpected - it’s like, you do your thing, and let me do my thing.

Still figuring out how to wrangle all this but my experience with Claude 3.5 as an agent was the right balance of understanding when to go and when to hold back.

1 Like

You need to temper 3.7 with some “rules for the AI” to stop being too proactive. I imagine cursor team will be changing their prompts to help with this in future.

1 Like

Ha believe me - tried it, done it. My rules for AI is starting to get a bit ridiculous (perhaps contributing to the problem). But it’s frustrating to have had rules in place for this long, and then suddenly have to add / spell out some of the most basic communication rules that I never had to before, and seems hit or miss as to whether it can follow them.

Yeah I’ve found 3.7 basically completely unusable. Keeping it on task is nearly impossible, and it doesn’t seem any better at the kinds of programming tasks I’ve been asking 3.5. I’ve since gone back to 3.5 and having much better experience.

ChatGPT-4.5 appears to be very excellent, but impractical from a cost perspective.

This…


Screenshot 2025-03-10 at 8.19.26 PM

Claude 3.7 seems far more tame now after Cursor 0.47 (like it used to be :slight_smile: ) Feels much more balanced again - nice to see.