for some reason the gpt3.5 plus debugger, is a winner in this benchmark across all gpt4 variations HumanEval Benchmark (Code Generation) | Papers With Code what are all the gpt4 variations in that list and where is Claude at all? I don’t understand. so gpt3.5 “reasoning and intellect” capability is absolutely “good enough” for any coding tasks in reality, the only thing it needs is either a larger context window or a ‘feedback loop’ (which I guess is what they did to score 1st place in benchmark?).
it’s pretty sick to see a gpt3.5 topping any benchmark chart. gonna read more about it. but looks like we’re wasting our time hunting for ‘best and better’ models like Claude, it’s not where the secret is.
look where Claude is on the benchmark , the opensource ‘starcoder’ (octocoder, specially tuned starcoder in that table) beats it (5th place, vs Claude on 9th place), so why Claude is all the hype . I’ll try to find that OctoCoder on openrouter, maybe can connect to Cursor, seems like best option.
Hi, have you added a model in this format: anthropic/claude-3-opus?
I didn’t succeed the first time, let me check it tomorrow and I’ll give you the answer
I’m currently experimenting with a local DeepSeek-coder model (which also can run as the backend for Cursor, because in LM Studio there’s webserver option to make it listen on localhost) and it’s surprisingly random, sometimes it nails the answers based on a chunk of docs which I paste to its context (about 3k tokens of docs) and sometimes it misses and starts giving me things I didn’t ask for o_O … playing with top_p, temp, parameters and hoping that a fine-tuned version of this model will be better for me than generic gpt4 or 3.5 (because I deal with unknown new language “Verse” for unreal engine, and those models still don’t know anything about it, not included in their data sets). Although it’s definitely possible to work in Cursor with it and gpt3.5, but I want a 100% precision to any question asked in the chat, vs a 50/50 precision that I get now , it should be possible with a local model wish me luck.
Given that DeepSeek Coder V2 is now beating GPT4o can we reopen this issue and get LLM’s in here now? Seeing as the cited excuse was that no other OSS models were comparable yet?
I’m running it locally in my terminal or some version of it with ollama? Idk just doing ollama run deepseek-coder-v2 and it runs (quite fast) on my M1 Mac. Good idea with openrouter! Thanks
It is easy to use ollama by changing the override OpenAI Base URL, but ideally, we should have a fourth provider called custom OpenAI compatible to be able to combine the local models with those of OpenAI, which is the closest to my workflow.
After installing Cursor and importing some of my most used VSCode plugins - the very first thing I went to change was to set Cursor to use either my Ollama or TabbyAPI LLM server.
I was quite surprised to see there weren’t native options for Ollama and the only OpenAI compatible option was to override the base URL which feels a bit all-or-nothing and doesn’t auto-populate the model list with you available models.
There’s really big advantages in being able to easily use local LLMs especially if you’re already running them for multiple other tasks:
They can be very fast
Have excellent domain specific knowledge
Be fine tuned and customised
Tooling augmented
A LOT cheaper to use if you’re a heavy user
You don’t get rate limited (which always happens at the worst time)
They work offline and with poor internet connections
Are privacy respecting (which with many clients I work with is a requirement)
For example, DeepSeek-Coder-V2 and Codestral are two models that are really fantastic, between those two I get better quality multi-shot code generation than I get from GPT4o more than 50% of the time.
In VSCode continue.dev and Tabby have pretty decent integration with both Ollama and OpenAI compatible API endpoints as first party citizens, but their extension features are not as nicely integrated into the IDE as Cursor.
By comparison when I added either my local Ollama OpenAI compatible API endpoint to Cursor and manually added the models I mostly use it seems Cursor just errors with: