Great business Model: Agent breaks. Agent eventually fixes. Pay both ways

I upgrade from Pro to Pro+ to take leverage the “good stuff”.

What I got was agents/models that FUBARed while refactoring my code and cost me days to fix all the unintended consequences.

Sonnet 4.5 = made a mess. Tried Grok Code trying to fix the mess, then to ChatGPT 5.2codex…more messes and nonsense. Back to Sonnet 4.5.

About 10 minutes before we finally stamped out the last bug it caused my tokens ran out.

I had to stop it many times to debug myself and keep it from making a bigger mess or just wasting tokens on nonsense.

Waste of money in my case.

I accept that I may have been doing something wrong but I was in love with Cursor Pro and was expecting 3X more awesomeness in Pro+ but got a crap sandwich instead.

1 Like

Hey, thanks for the feedback about the Pro+ experience. I get the frustration.

A few things that can help improve the agent’s output quality:

Rules and context: The agent works much better when it has clear project rules. Try adding a .cursorrules file with instructions on code style, architecture, and what it can or can’t change. Details here: Rules | Cursor Docs

Checkpoints: If the agent made changes you don’t like, you can roll back using “Restore Checkpoint” on earlier requests. These are local snapshots, separate from Git.

Prompts: Try to be more specific in your instructions. For example, instead of “refactor the code”, use “refactor function X, keep the current API, and add tests.”

Models: For big refactors, it can help to start with a plan first, ask the agent to describe what it will change, then confirm before it edits. This can save tokens on fixes.

Could you share a couple details so we can diagnose this:

  • What type of project is it (language and framework)?
  • Examples of prompts that gave bad results?
  • Do you have a .cursorrules file?

This will help figure out if the issue is context setup, prompts, or something model-related.

Thank you for the support and tips despite my venting.
I will refine my .cursorrules doc and see if that helps.
I’ll try to make better use of checkpoints too.

I will take your suggestions to heart and do everything I can to set my AI coding partner up for success.

My prompts are detailed, include context and often include .md docs with rendered results and/or relevant data snapshots to consider so the agent can see the real world pieces of the puzzle. I totally get the “if you want a specific result give specific instructions”.

The rollback in this case was problematic because the refactor of a class appeared solid, so we moved on to make many more dependent changes and finish up the downstream processes related to the class change.

The problems became apparent later while testing other things. The agent had removed or jacked up several functions that were not related to my instructions such as deleting things it didn’t think needed to be in the class without putting the functionality elsewhere. It ignored the fact that those functions were used.

I had backups of course, but it wasn’t supposed to change the outcomes of the class functions, just make the path to those results more efficient.

My biggest disappointment with my Pro+ experience was having to take multiple passes with detailed instructions to fix things.

The agent made a bunch of changes and said “fixed”…but not fixed. Go at it again 3-5X with specific feedback, specific logging of the process. Finally I gave up and fixed it myself…which defeats the purpose of having an AI coding partner.

I’m just working on a complex WP plugin. It’s not rocket science.
In one case it took a half dozen tries for it to figure out how to properly save an admin screen for a custom table. The data should have followed the same pattern for saving used by several other screens for other custom tables in the same plugin. It was acting like it was completely unaware of the context of the project.

I’m sure I’ll try Pro+ again at some point.

1 Like

I always use Opus 4.5 for everything.

1 Like

you can use sonnet to planning and grok code or auto to create changes

Remember kids: these companies ALL get paid by usage. Why do you think these models are so long winded? They are specifically designed to respond with the most verbose responses possible, driving up usage through the roof.

No coincidence they’re called “tokens”, either. It’s just gambling with code.

Learn to swim, all! LLMs can be used in responsible and productive ways, but “agents” are largely just a way to scam you out of more money. They have their place here and there, when regulated heavily and assigned more deterministic data processing-type tasks.

1 Like

I agree there’s no incentive for them to minimize token usage. However, I think we can curb the appetite by how we use it. Now that I’m deep into my project and don’t want Ai to thrash about making messes, I’m telling it to analyze only. No coding. Until we agree on the fix, then code.
I think we can also add instructions to not write essays every time it performs an action.

You definitely can reign them in, and I’d argue it’s absolutely essential. I liken it to when I get a new device like a phone or a PC and the first thing I need to do is uninstall the bloatware. :sweat_smile: No matter what LLM tool I am using, if its leveraging a major frontier model, you need to prep them with detailed system prompts and instructions to mitigate their token churn. I recently was using Google’s AI Studio with its default settings and the outputs to even the most basic requests were novels, and 95% of it was filler.

1 Like