After the creators of the RepoQA benchmark implemented a new evaluation method, they found that Haiku performs exceptionally well. I was wondering if it would be possible to integrate Haiku with a context size of around 20-50k and allow for 10 times the number of requests compared to Opus (i.e., 10 Haiku requests for every 1 Opus request). This approach should still be more cost-effective overall.
Essentially, this would equate to 5,000 Haiku requests, as Haiku is 60 times cheaper than Opus. Even with a 50k context size, it would remain cheaper, and for 20k, it would be significantly more affordable.
Isn’t it already possible in the long context chat? Both Sonnet and Haiku are available there with 200k context length, and opus is also available there via your own API using this method:
Yes, it is generous of them to add more Haiku (20) /Sonnet (50) calls. However, I would like there to be an intermediate level, as I stated with a context size of ideally 50k, 20k minimum. The quality deteriorates at 200k from empirical evidence quite a bit going up to 100+k (even though it passes the needle in the haystack tests).
I don’t mind 1 Opus call being equal to 5 Haiku calls even though it is 60x cheaper (making it 12x cheaper if context goes from 10k to 50k).
This is quite far fetched but would solve a big use case for me of quickly iterating on bugs which a large context window and a high chance of fixing them.