Need Advice: Is Cursor AI Reliable for Production-Grade Code Reviews?

Hey folks,

I am currently exploring AI tools to assist with code reviews and came across Cursor. It looks super promising with its deep integration into coding workflows.., especially the way it understands project context better than typical code editors with AI plugins.

That said.., I wanted to ask the community: How reliable is Cursor when used for real-world, production-level code reviews: ?? I am mainly working with TypeScript and Python projects.., and I would like something that can catch potential bugs, offer architectural suggestions, and even help optimize logic—without introducing hallucinated code.

If you have used Cursor for a few weeks or longer.., I would love to hear:

How accurate are its code suggestions ??

Can it handle large, multi-file projects well ??

Any performance or security concerns ??

Would you trust it in a professional dev environment ??

Appreciate any honest insights or even screenshots if you have got examples. Trying to gauge if it’s worth fully integrating into our dev workflow. sap fico training in pune

Thanks !!

Marcelo

I’ve been using Cursor from November of last year, your main concern relies on the quality of the LLM and not Cursor as it’s a ‘bridge’, that said, I would argue that Cursor is the best in achieving a production-level code or review because of its features and at second place Claude, the problem with Cursor is its learning curve, engineers may need a month of learning before becoming effective but that happens with any top-notch software, its easy to throw 50 files into Claude and Cursor then say ‘Claude did it! Cursor struggles!’, comparing apples to oranges, what actually happens when rules are properly structured and the smallest context is given? a better result and that’s where Cursor shines, giving the best value for money
About security concerns: LLMs got trained on any type of code, you’ll need to write multiple specific rules
About code performance/reviews: it depends on LLM quality and context quality
As an example, I write code with sonnet-4-thinking and code documentation/review with o3, here’s a review on a production-level RPC interface:

Performance Considerations

  • Asynchronous Operations: All I/O (RPC calls, Redis operations, health checks) are asynchronous using asyncio.

  • Response Caching: The RPCCacheManager significantly improves performance for applications that frequently call static methods, avoiding network latency entirely for cached hits.

  • Health Check Interval: The health_check_interval and rate_limit_check_interval in RedisConfig balance responsiveness against the overhead of frequent checks.

  • Timeouts: Provider-specific request timeouts and internal health check timeouts prevent indefinite hangs.

  • Redis Caching: Using Redis for health status avoids redundant checks by every process, but introduces Redis latency.

  • Rate Limiting: Prevents overloading providers but adds a small overhead for checking limits and potentially delays requests.

Security Model

  • RPC Endpoint Security: Relies on the configured provider URLs. Using HTTPS endpoints is crucial. Authentication is handled by the endpoint itself (e.g., API keys in URLs).

  • Redis Security: Network access to Redis should be restricted. Password authentication for Redis is supported by the client.

  • Configuration Security: Sensitive information like API keys in RPC URLs should be handled securely (e.g., via environment variables) and not hardcoded.

Future Improvements / Known Limitations

  • Circuit Breaker: The “suspect” status is a step towards a circuit breaker, but a more explicit pattern (e.g., keeping a provider sidelined for a minimum duration after multiple failures) could be implemented.

  • Dynamic Reconfiguration: Provider lists and configurations are loaded at startup. Adding support for dynamic updates without restarting the application could be beneficial.

  • Detailed Metrics: Expose more detailed metrics about provider selection counts, latencies, error rates, cache hit/miss ratios, and rate limiting events for monitoring.

  • Configurable Retry Logic: The retry logic in RPCProxyProvider (number of retries, backoff delay) could be made configurable.

1 Like

As normalnormie said, that mostly depends on the AI model you choose.
The better the AI model and the better it can review code.

Current premium models (Claude 4 Sonnet, and similar) are very good at reviewing code. Python and Typescript are very common programming languages so the models have sufficient knowledge about them.

Note that some details depend on the frameworks you use, if you write tests, have static analysis, linters etc. to prevent common mistakes or catch them.

AI can analyze logic and optimize it but there is no model that does not occasionally produce hallucinations. That depends also a lot on your prompt.

e.g. If you ask AI to do something it doesnt know or doesnt have info about it would try to give a helpful answer which may be a hallucination. So make sure to tell AI to inform you if it doesnt know the answer or doesnt have sufficient info.

Many here are using Cursor to code, write tests, review code and so on.

From my personal experience

  • latest models are very good and rarely hallucinate
  • however it takes a bit time to understand how AIs work and how to avoid hallucinations or mistakes
  • Cursor handles large projects quite well.
  • Performance concerns would usually depend on extensions you use, sometimes on very very long chats,…
  • Security concerns are not different to any IDE. The AI integration uses Zero Data Retention, so no prompt or code is used for training by AI providers and you can set your account/team ot Privacy mode so also no information is kept by Cursor for internal purposes. Some features like User Rules, Memeries, Background Agent etc require some storage of data/code for those purposes. but you can turn the features on/off or set privacy to Legacy Privacy that turns those features off.

Many users are using Cursor in a professional dev environment (businesses and solo developers).

1 Like
  • I typically dislike its “suggestions” as in, his attempts to rewrite my code. I use to be very angry at it, but setting up cursor rules, and periodically reminding it to use surgical changes helped a ton. Its suggestions are much better when you specifically ask for the best practices in a topic you are unsure of.

  • My project is roughly 1.5 MB of LoC split over a bazillon file + the hacked open source lua game engine I use (moai), as another project. It works surprisingly well, BUT you need to keep sessions focused. I typically ask it to document any change for its future self afterwards (one thing it does better than myself!).

  • Security concerns: no idea, but I typically don’t care that much about my sourcecode going somewhere(as it would probably be more efficient for someone to just go with Unity unless they are making the exact same game as me I suppose?).

  • I am a professional solo dev studio(I work with freelancers, but they only provide the art assets or other non coding work), but would I trust it with coworkers? I cannot really say. It has done some very, very bad refactor that broke everything. It was easy to roll them back, though. When Claude did the same, it had no built in “cancel your worthless changes, you useless piece of junk” button, and I had to fall back to my previous daily backup.

As for debugging itself, LLM are all pretty useless at it: They will spot typos, but the training process was done over code results. I doubt LLM could have any experience with the actual debugging process of going through memory, variables, and all.
All AI models will provide asumptions about what are usually wrong, as they don’t have the ability to use/access debugging tools (but they can be great at catching typos or similar issues!).

However, enabling some print to file statements and asking it to analyze them worked well.

Bottom line: If you plan to use LLM, you could as well go with Cursor. If you think what you are working with is too sensible, and shouldn’t be touched by LLM, then cursor won’t change it.
I make sure to have it write/change as little as possible everytime, and it helped me well so far. I typically use Sonnet for UI, and high level tasks, and Opus for more complex ones. I use a few prompts to ask it to read the relevant files and docs and tell me how it’d approach the issue, without writing any code.

That said, Cursor availability could be an issue (I set the over expanse cap at 50 before. I don’t have enough data points with the new pricing model to fully grasp whether it would be limiting in the long run).

The game is named Zodiac Legion if you want to check whether it would qualify as a real world project.

1 Like