I am reporting a critical failure in the billing model regarding Agent performance.
The Issue: I am being charged for “hallucination loops.” The Agent confirms a task is done, but delivers non-functional code. I then incur further costs as the AI attempts to fix its own errors.
Billing Contradiction: My dashboard labels $66.90 in usage as “Included in Pro”, yet support claims these are “valid on-demand charges” that cannot be refunded.
Product Failure: This is not “usage”—it is a product defect. Charging customers for the API costs of a model’s failure to perform its core function is unacceptable. I request a human manager review this case for a refund of usage generated by verified model hallucinations.
I get the frustration — watching costs climb while the agent spins its wheels is genuinely annoying. But I think the framing here is off.
The AI isn’t a contractor you hired to deliver working software. It’s a power tool. If I rent a nail gun and put a dozen nails in the wrong place, I don’t send the invoice back to Home Depot because the tool “failed to perform its core function.” I just missed.
Every token the model generates — including the bad ones — costs Anthropic/OpenAI real compute. Cursor is passing that cost through. That’s not a billing defect, that’s how usage-based pricing works. The model doesn’t know its output is wrong. It doesn’t “hallucinate” on purpose. It produced tokens, you consumed tokens, the meter ran.
The part we actually control is the loop. If the agent confirms something is done and it isn’t, that’s the moment to stop, read the code, test it yourself, and course-correct manually instead of letting it keep swinging. The skill with these tools isn’t prompting — it’s knowing when to take the wheel back.
None of this means the tooling can’t improve (better hallucination detection, spend caps, auto-stop on repeated failures — all fair feature requests). But framing working-as-designed token usage as a “product defect” that warrants a refund is a tough argument to make.
@PizzaConsole I appreciate your perspective, but the nail gun analogy fundamentally completely misrepresents what Cursor is selling.
A nail gun only fires when I pull the trigger. It doesn’t independently evaluate its own work, turn to me and say, “The cabinet is perfectly built!”, and then automatically fire 500 more nails into the floor while I’m reviewing the first one.
Cursor is not selling raw, passive API access; they are marketing and selling an “Agent.” The entire value proposition of an Agent is its autonomy and reasoning loop. If the user has to micro-manage and babysit every single token generated to prevent the Agent from financially draining them in an infinite hallucination loop, then the Agent feature is a liability, not a power tool.
I completely agree that OpenAI/Anthropic charge for compute regardless of quality. However, Cursor controls the orchestration layer. Cursor built the autonomous loop. Profiting from a runaway loop that lacks basic guardrails (like auto-stop on repeated failures or detecting zero-value token generation) while the UI labels these costs as “Included” is exactly what makes this a product defect.
When a company monetizes a faulty autonomous loop and puts the entire financial burden of the software’s lack of guardrails onto the user, it is an unethical billing practice.
in starting the agent do hallucinations and not do the actual prompt and waste no of tokens and charge more. thats part of training too who knows. fun is agent claims that it did correct and count tokens in lakhs
Fair point on the nail gun. I’ll take the L on that analogy. You’re right that an Agent that autonomously loops, self-evaluates, and says “done” is a fundamentally different product than raw API access. That distinction matters.
Where I still disagree is the leap from “the guardrails should be better” to “this is an unethical billing practice” and “I’m owed a refund for all of it.”
You’re describing a product that’s immature, not one that’s fraudulent. Cursor shipped an agentic loop without robust auto-stop, spend caps, or hallucination detection. That’s a real gap, and I genuinely think you’d get traction framing it as a feature request or even a formal product complaint. But the Agent mode is opt-in, the pricing model is documented, and the known limitations of LLMs didn’t suddenly become a secret the moment Cursor wrapped them in an autonomous loop.
The “Included in Pro” labeling issue is the strongest part of your case, honestly. If the dashboard says included and support says otherwise, that’s genuinely confusing UX and worth escalating. But that’s a labeling/communication problem, not evidence that the billing model itself is designed to exploit you.
I think we actually agree on the fix: guardrails, loop detection, spend limits, clearer cost attribution. Where we diverge is that you see the current absence of those features as something that entitles you to a refund, and I see it as a rough edge of a young product that we all opted into knowing these models hallucinate. The agent doesn’t change the underlying reality that nobody has solved hallucination yet. It just makes the consequences more expensive when you’re not watching.
Also, and I say this respectfully, Cursor’s Terms of Service cover a lot of this explicitly. Section 1.4 states that suggestions “may contain errors or misleading information” and that you agree you’re “responsible for evaluating, and bearing all risks associated with” them. Section 14 delivers the service “AS IS” with no warranty that it’ll be error-free, and states that “any use of Suggestions from our Service is at your sole risk.” Section 1.7 puts full responsibility on the user for auto-executed code. And Section 4.1 says fees are non-refundable except as required by law. You agreed to all of this at sign-up. I know reading terms of service isn’t anyone’s idea of a good time, but if you’re going to make legal claims like “unethical billing practice” and “product defect,” it’s worth knowing what you actually agreed to first. The TOS doesn’t support the framing of a refund here. It actively contradicts it.
@PizzaConsole I appreciate the healthy debate. To refine my point: I fully accept that AI is not perfect and will make mistakes. I have no problem paying for usage even when the model doesn’t get the logic right on the first try. That’s the nature of the tech.
My issue is with workflow integrity. The problem occurs when the Agent confirms it understands the task, proposes a specific plan that I approve, and then claims to have executed it “perfectly”—only for me to find it ignored the plan and made unauthorized changes elsewhere.
I shouldn’t be billed for the extra tokens required to clean up a mess created by an Agent that deviates from its own confirmed execution path. While the TOS covers “errors,” it shouldn’t be a shield for a broken feedback loop that profits from failing to follow a mutually agreed plan.
@Suchetan_B You hit the nail on the head. It’s incredibly frustrating to pay for “expert analysis” that comes back with terms like “Probably” or “It’s possible.”
Programming is an exact science; the code is either in the file or it isn’t. When the Agent charges us to deliver “uncertain guesses” after claiming it performed a check, the value proposition of the tool breaks down.
We are paying for the Agent to be a partner in development, but right now we are paying for it to be a “confident hallucinator” that doesn’t even respect the plan it just proposed. It’s a defect in how the autonomous loop handles verification and billing.
@Eduardo_Bicudo I hear you, and I appreciate you refining the argument because it forced a much better conversation than the original post. The plan deviation scenario is genuinely frustrating and I’m not dismissing that.
But there’s a practical problem nobody in this thread has addressed: how would you even prove what’s refund-worthy? Where’s the line between “the agent made a reasonable attempt that didn’t work” and “the agent deviated from the plan and wasted tokens”? Who draws that line? Do you submit your approved plan and the agent’s output to Cursor support and ask them to adjudicate whether the deviation was egregious enough to qualify? At scale, that’s unworkable. Every user who got bad output would claim their tokens were wasted by a “broken feedback loop” rather than by a model doing what models sometimes do.
I will say, Cursor doesn’t help themselves here. The marketing leans heavy into the “AI partner” framing without doing enough to set realistic expectations about what these models can and can’t do reliably. When you sell the dream of an autonomous coding agent but the reality is a probabilistic model that confidently guesses wrong sometimes, that gap is going to create exactly this kind of frustration. Programming is exact. LLMs are not. Every output is a statistical prediction, not a lookup. When the model says “probably” or “it’s possible,” it’s actually being more honest than when it confidently tells you something incorrect.
What Cursor really needs is clearer documentation around what the agent is actually capable of, what its known limitations are, and what “agent mode” realistically means in practice. Something like a plain-language glossary of how the product works, what “autonomous” actually implies, and where human oversight is still expected. Right now there’s a massive expectations gap between the marketing and the experience, and threads like this are the result.
Final Update: Moving away from Cursor due to unaddressed workflow defects
I appreciate the tips shared here, including those from support. However, I want to clarify that I am an experienced LLM user and I have been practicing these “best practices” for a long time with other models:
Reloading chats frequently (Ctrl+N).
Breaking tasks into small, focused chunks.
Including only strictly necessary files.
Selecting the specific model for the task.
Even with these precautions, the problem persists: the Agent confirms a plan, fails to execute it, and I am billed for that failure.
If a “power tool” requires the user to be a constant babysitter just to avoid financial drain from known hallucinations, it’s not ready for professional deployment. I’m tired of paying for “uncertain guesses” and unauthorized code changes.
As a result, I am stopping my use of Cursor. I’m moving to other tools where the billing is more transparent and I’m not penalized for the model’s inability to follow its own plan. Thanks to @Suchetan_B and @PizzaConsole for the debate.
not following the whole discussion but on a short note: cursor needs a reliable loop detection. if the current stream repeats itself over and over it should be relatively easy to intercept and cut the cost, by invoking a break.
if cursor cuts context-size of the models capability and thus the model has a hang to loop (e.g. most other models than claudes) due to wrong set boundaries, it should be cursors responsibility.
“@SKiel Exactly. You nailed the technical responsibility. If the orchestration layer (Cursor) fails to detect a repetitive loop or mismanages context boundaries, the financial burden shouldn’t fall on the user. Since support insisted these were ‘valid charges’ despite the clear product defect, I’ve decided to move on. I’m switching to Windsurf now. I prefer a tool that doesn’t penalize the user for its own architectural gaps. Thanks for the input!”