We have many choices these days in picking the right horse in this race. For coding, I’ve looked at LiveBench and would you agree this is accurate?
I’ve been using Claude 3.7 but it’s over-engineering these days.
We have many choices these days in picking the right horse in this race. For coding, I’ve looked at LiveBench and would you agree this is accurate?
I’ve been using Claude 3.7 but it’s over-engineering these days.
Most hybrid thinking models have the same issue with overdoing it or applying reasoning that choses inaccurate facts that sound plausible instead of the actual facts. It requires many changes on prompts / rules to keep them on track.
Overall I agree that Claude 3.5 does better in generating code than 3.7.