Why is Anthropic not understanding this

Hey, I am a technical person, and I am not sure why is Anthropic making very aggressive models now.

Sonnet 3.5 was great, staying in control.

3.7? Bad

4? Slight better than 3.7 but it’s the same story, you ask it one thing, and it implements every case for it.

I know the cursor rules help but still the models should not run ahead of us.

The aggressive agentic behavior may be good for non-technical people jumping into this but trust me, us technical people like to stay in control and check every line that AI writes.

For me GPT-4.1 is the King right now.

3 Likes

I agree, the model does a lot of unnecessary things. I think it’s because they want it to appeal to a wider audience — users who aren’t fully comfortable with AI yet, maybe somewhere around the intermediate level.

I believe it can be kept in check with clear rules and small, incremental tasks.

3 Likes

4.1 is like a friend, always checking before taking action.

1 Like

agency always increases with intelligence

1 Like

mine keeps making me .md file :rofl:

tell it to discourage defensive coding, defaults, fallbacks, and check for systems or hooks or settings or pages/components, or simply follow up with a prompt that has it remove the defensive fallbacks and default settings. it causes weak error handling to take place of the real bugs we need to fail. tell it to trust data flow and allow things to fail.

theoretically we should have power over the entire directive at our own risk inside of their IDE up to what they have control over. we will always be guessing about implementation unseen, which is most of the advancements in the last year. prompting language will begin to outweigh the model power impact in a meaningful way soon enough/already has. prompting is just as important, because you can achieve similar agentic systems with a q8_0 model, locally using Ollama on a 3080. it takes much more prompting, but these AI companies have warehouses of cards much more powerful than that, so I’m left to wonder what the deal is. unsure why these organizations think it appropriate to expect consumers to use new tech, that embeds meaning in high order nested gated optimized functions, to work around that question mark, with a model that literally won’t even be told to just deterministically postpone questions that breach its directive instead of implicitly shifting its context to play the role of dishonesty by omission. if cursor cant change it, then im out, because they aren’t anything more than an IDE imo. especially if they don’t understand the importance of additively changing embeddings vs allowing us to set them from scratch. i can only do so much amongst the 30 contexts they think is appropriate for me and everyone else. i can’t understand who experienced in LLMs would voluntarily put themselves in Cursor’s position, unless they don’t plan on existing for much longer. seriously idk. they put an experimental Gemini model, with 0 transparency about the direective for the underlying experiment, into a PRODUCT YOU ARE PAYING FOR WHEN YOU CAN GO TO GOOGLE AI STUDIO AND USE IT FOR FREE