Bug bot efficiency

I was reading the BugBot blog and this section stood out to me

After weeks of internal qualitative iterations, we landed on a version of Bugbot that outperformed other code review tools on the market and gave us confidence to launch. It used this flow:

  1. Run eight parallel passes with randomized diff order

  2. Combine similar bugs into one bucket

  3. Majority voting to filter out bugs found during only one pass

  4. Merge each bucket into a single clear description

  5. Filter out unwanted categories (like compiler warnings or documentation errors)

  6. Run results through a validator model to catch false positives

  7. Dedupe against bugs posted from previous runs

This sounds great, but seems expensive.

As an engineer working with LLMs and prompts for various tasks (besides coding assistants), I’m pretty aware about my prompt sizes, how many tool calls I’m doing, and how that’s impacting the cost and runtime. I’d love to run things multiple times, have an LLM evaluator, and merge together the best result, but for me it’s too expensive (both $$ and time).

Out of curiosity, a question I have for the Cursor team and maybe others who do something similar, is this not as inefficient as it seems?