Needs built-in QA and self-check

Like any good developer, Cursor needs to check its code as it creates it. We all know the definition of “done”—and it ain’t spewing out whatever comes to mind. To truly enhance the reliability and accuracy of the code generated, incorporating a robust QA component is essential. Here are some suggestions on how this could be implemented:

  1. Automated Test Generation: Automatically generate unit tests alongside the code. (anyone doing this?) This ensures that each piece of generated code is accompanied by tests covering various scenarios, from typical use cases to edge conditions.
  2. Self-Testing Mechanism: After generating code, the AI should run these automated tests in a controlled environment. If tests fail, it should log errors and analyze them to improve future outputs.
  3. Incremental Learning: Establish a feedback loop where the system learns from failed tests and user feedback, adjusting its algorithms to improve accuracy over time. This is key
  4. Environment Simulation: Simulate different programming environments to ensure the generated code works under various conditions. This includes testing against multiple versions of languages and frameworks.
  5. Benchmarking Against Existing Code: use established open-source projects to benchmark the generated code against best practices and identify areas for improvement.

I agree that it’s important, but unfortunately that is an unsolved problem in AI. There is no model in the world that is capable of this. After initial training of a model, RLHF has to be done to make it useful. Following that, too much “fine-tuning” beyond that point can result in all kinds of issues, like “catastrophic forgetting”, undoing the effects of the RLHF, etc.

So, all models today can’t learn anything except within their (limited) context windows. And even in the case of larger context window capable models, the larger the context window grows, the more expensive it is to continue running the inference.

Self-testing is definitely help as it can ground agentic AIs, but the reality is that it’s incredibly daunting trying to account for all the possible combinations of languages, libraries, and platforms one might need to be able to run any random code someone is working on automatically.

Ie, all of these are good ideas, but they require huge breakthroughs in AI that are out of the scope of Cursor.

1 Like