Hey,
I’ve noticed Bugbot flags “major” issues that turn out to be hallucinations, usually because it’s missing context or relying on outdated docs.
For example, it recently warned me that the Stripe Checkout API didn’t allow two fields together, but the official docs clearly say it does. It would be great to have a way to report these false positives with an explanation, so Bugbot can improve over time.
Completely agree with all of this. The false positives are a real pain point. I’ve had similar experiences where Bugbot flags things that are clearly valid according to current documentation. It makes it hard to trust the tool when you’re constantly second-guessing its findings.
The hallucination issue feels like it stems from the model not having up-to-date knowledge of third-party APIs and libraries. A newer underlying model would go a long way here, and it’s frustrating not knowing which model is even being used.
The lack of user control over the model is a bigger issue than it might seem. If Bugbot is positioned as a harness, users should be able to swap in a model they trust. That alone would address a lot of the reliability concerns.