How do you test and review Cursor created code?

Maybe it is my OCD, but let me throw it here in the open: do you review your code? How? How do you know it works?

For example, I tested how I myself work with Cursor, vs how I work with Claude Code: Cursor is great for ‘vibing’ - do that, I’m out to the toiled (ok, I don’t do that really) and I will check it when I am back’. When I am back, it let’s say added some 10 methods across 10 files. I use rules, I believe it looks the same as the rest of my code.

Then I write unit tests, to learn that there are subtle issues it made: transactions not handled properly on places where it is crucial to be handled properly (even I have it coded and described in rules as clearly as it goes, but it happens), suddenly forgot to even use the context, and it’s cool, I catch it there, fix it, move on. Kinda inverted TDD, because TDD approach with Cursor would be just too expensive the way Cursor works nowadays. And just to mention - talking about Claude 4 Sonnet, not thinking.

In Claude Code my mode of work is a bit different: I don’t open it if I am not fully focus and don’t have time to be fully focused. Which means I use it a couple of hours per day, max, but the quality of code and work I do is much higher, due to the approach of reviewing every single change first before I move to another one. After a couple of hours of such coding (where I catch myself fixing code myself while Claude is working on checking tests at the same time and doing the analysis) I feel that I am tired, so I don’t need Claude Code for 12hrs per day which I believe I could let Cursor do it’s stuff while I am doing (or not doing) something else.

Which brings me to the question: how are you all testing your stuff? Are you? Do you know it even works? Are you using it in production?

My User Rules instruct the Agent to make a report at the end of his work. AI does not always fully follow the structure, but in most situations the changes are understandable. I also often monitor the process itself, occasionally interrupting the Agent or redirecting.

I also set up strict linters and static analyzers
(I’m currently developing Agent Enforcer based on this methodology)

I use TDD when I can’t solve a problem quickly. Both the tests and the code are written by the Agent.

That’s great example: how do you test actually what you made there? And how do you trust AI that it works the way it should? I have waaaaaaay smaller example of what you wrote, trying to ensure something as simple as transaction with specific code examples, and it doesn’t follow it in at least ~30-40% of cases.

You can clone AgentDocstring and examine its pytest. There might even be a check.py is at the root.