New Benchmark shows the new gpt-4-0125-preview is lazier than the previous version

I wanted to share with you folks, mainly with the Cursor team, this new benchmark that just came out: Lazy coding benchmark for gpt-4-0125-preview | aider

Basically, the benchmark indicates that the new model is actually lazier than the previous one.

Also, if you check OpenAI Discord, X, Reddit or any social media, you will see similar statements: Turbo models are lazy, hallucinate a lot, provide wrong answers and do not follow instructions.

at my company, i was thinking about getting a Cursor license for the whole team but gave up the idea because we are being forced to use the new Turbo models, so much easier to just get OpenAI API key and use the good old model, but I’d rather give my money to Cursor than to OpenAi tbh

Someone here told me fast GPT-4 requests still used the old model but I don’t think it’s true, at least for my account it was using the Turbo model when I was a Pro subscriber because model was with knowledge updated to April 2023.

3 Likes

“fast GPT-4” is just “priority GPT-4 in our queue”, misleading name.

Edit: It’s getting confusing - there is also the “cursor-fast” model, a finetuned gpt-4 IIRC…

1 Like

Is there any benchmark on the implementation side? Like different code copilot platforms on same model?