Ai Based Quality Control for Code Outputs

Right now, Cursor has a model trained to apply code, but it lacks a secondary model to review outputs before handing them to the user. This leads to AI-generated responses that sometimes just copy existing code with no meaningful change—yet users on usage-based pricing are still charged, and Fast credits are deducted.

This creates a clear issue: If the AI provides an unusable or redundant response, there’s no safeguard in place to catch it before it reaches the user. That puts the burden on us to manually request refunds and track failed outputs, which shouldn’t be our responsibility.

A simple solution:

Cursor should implement an AI-powered quality control system that checks:

  • If the output is a direct copy of the user’s existing code.
  • If changes meaningfully align with the request (e.g., if you ask to adjust a border radius, it shouldn’t randomly change a button color).
  • If the output meets a baseline usability standard before being billed.
  • Other edge cases where output quality would result in no meaningful use.

What happens when a bad output is detected? Options could include flagging it with a reason, re-prompting with an explanation, or simply notifying the user that they may want to restart due to context overflow. The goal is simple: limit how often we’re paying for unusable outputs.

This isn’t just a benefit for users—it helps Cursor maintain quality control while reducing refund requests and unnecessary overcharges. AI mistakes happen, but charging for them without a review system isn’t sustainable.

Would love to hear thoughts from the community—this seems like a fair and necessary improvement.