Hi,
Just a thought…
When diffs are suggested, removals are in red and additions are in green.
What if there was a second opinion dropdown in chat, where you could select a second model and it would make an additional prompt to that behind the scenes?
You might see the diff suggestions from two different models (say Claude and O1) using different colour schemes on your source file.
If they agree on edits, that’s useful feedback that the approach is valid. If they disagree, you can just stick to the model which is heading in the better direction.
I know that would double up token quota use, but it might be a useful tool occasionally when you want to double check something important.
That’s a good idea, in theory, to increase accuracy.
Practically speaking though, part of the problem is that it’s unclear whether the models agree on the edits from just observing the output. i.e. determining “if they agree on the edits” is not trivial.
Two models may both suggest viable approaches to the same problem. You might be able to judge one solution as better than the other (or even equivalent functionally) via introducing a third model (LLM as judge), but even that gets increasingly complex when you consider the edits within the broader application.
i.e. a solution may look more elegant from looking at the outputs in isolation, but within the context of your app the more complex solution may be the correct one.
Sorry, I wasn’t exactly clear. What I mean is that getting two suggestions from different AIs is great when you can easily evaluate both solutions and know in advance what the best solution is.
However, both AIs might offer novel solutions to the same problem. Except in basic cases, it is hard to know which solution is better or if the solutions are equivalent.
For example, you might ask the AI for code that tries to manage the active state of a menu. Each AI might pose solutions which are functionally identical but look quite different from a code perspective.
It is very hard to evaluate this programmatically or using a third LLM (“LLM as judge”), except in simple cases.
Said differently: in simple cases, having two LLMs propose solutions for you can work. But in simple cases, this is not particularly useful.
It is more useful in complex cases. However, in complex cases, it is much harder to evaluate which of the LLMs’ solutions is functionally superior—or if the solutions are equivalent.