Support local LLM's

HappyQuokka · March 8, 2024, 8:13pm

for some reason the gpt3.5 plus debugger, is a winner in this benchmark across all gpt4 variations HumanEval Benchmark (Code Generation) | Papers With Code what are all the gpt4 variations in that list and where is Claude at all? I don’t understand. so gpt3.5 “reasoning and intellect” capability is absolutely “good enough” for any coding tasks in reality, the only thing it needs is either a larger context window or a ‘feedback loop’ (which I guess is what they did to score 1st place in benchmark?).

it’s pretty sick to see a gpt3.5 topping any benchmark chart. gonna read more about it. but looks like we’re wasting our time hunting for ‘best and better’ models like Claude, it’s not where the secret is.

look where Claude is on the benchmark , the opensource ‘starcoder’ (octocoder, specially tuned starcoder in that table) beats it (5th place, vs Claude on 9th place), so why Claude is all the hype . I’ll try to find that OctoCoder on openrouter, maybe can connect to Cursor, seems like best option.

Which model was best for you guys so far?

ssmits · March 9, 2024, 7:51pm

Tried the same for Claude 3, both with api key starting with ‘sk-or-v1’ or only the hash that comes after that, both didn’t work.

ssmits · March 9, 2024, 7:51pm

Currently does not work for me unfortunately, would be a great addition!

deanrie · March 9, 2024, 9:39pm

Hi, have you added a model in this format: anthropic/claude-3-opus?
I didn’t succeed the first time, let me check it tomorrow and I’ll give you the answer

HappyQuokka · March 9, 2024, 10:30pm

I’m currently experimenting with a local DeepSeek-coder model (which also can run as the backend for Cursor, because in LM Studio there’s webserver option to make it listen on localhost) and it’s surprisingly random, sometimes it nails the answers based on a chunk of docs which I paste to its context (about 3k tokens of docs) and sometimes it misses and starts giving me things I didn’t ask for o_O … playing with top_p, temp, parameters and hoping that a fine-tuned version of this model will be better for me than generic gpt4 or 3.5 (because I deal with unknown new language “Verse” for unreal engine, and those models still don’t know anything about it, not included in their data sets). Although it’s definitely possible to work in Cursor with it and gpt3.5, but I want a 100% precision to any question asked in the chat, vs a 50/50 precision that I get now , it should be possible with a local model wish me luck.

ssmits · March 9, 2024, 10:44pm

Yes, I’ve also tried to add chat/completions to the URL.
Screenshot 2024-03-09 at 23.42.52

deanrie · March 10, 2024, 7:37am

You must remove the last slash from the url

kaandok · March 14, 2024, 9:28am

I have the exact setup in your screenshot but it isn’t working for me.
Is there anything else you’re doing to get it working?

deanrie · March 14, 2024, 10:40am

Try restarting the Cursor, for some reason for many people it starts working the next day for unknown reasons

Vincent · March 14, 2024, 3:04pm

Configuring Cursor with Openrouter.ai isn’t entirely straightforward. I’ve prepared a step-by-step guide:

raw.works · March 14, 2024, 6:49pm

thanks boss!

Vincent · March 18, 2024, 2:39pm

With the recent update, you no longer need to use OpenRouter.ai or override the OpenAI Base URL.

Just add “claude-3-opus” as a custom model in the Settings page, and you’re ready to go:

Open Cursor.
Hit Ctrl+Shift+J to enter Cursor settings.
Scroll to “OpenAI API”.
Click on “Configure models” below the API key field.
Click on “+ Add model” (below the model names).
Enter “claude-3-opus” as the model name.
Click on the “+” next to the field to add the model.
Close the settings page.
Use Cursor as before with the new Claude 3 Opus model.

Note: You have to add “just give me the code, nothing else, no explanation, just the code” to every prompt to make it work.

madlad · April 8, 2024, 2:27am

Hi, can you show how to use a local model from lm studio?

leoing · April 24, 2024, 7:39pm

Shouldn’t it work like shown above with local URL like

127.0.0.1/
localhost/

and maybe port number, some more path?

sualeh · April 24, 2024, 8:45pm

no, we make requests from the server. we cant call your local computer.

jeffscott17 · June 21, 2024, 3:11pm

Given that DeepSeek Coder V2 is now beating GPT4o can we reopen this issue and get LLM’s in here now? Seeing as the cited excuse was that no other OSS models were comparable yet?

amanrs · June 21, 2024, 7:31pm

Deepseek v2 is possible with openrouter as a base url: Reddit - Dive into anything.

We also plan to support it after confirming performance on internal evals and getting the model setup with our own inference provider.

But running it locally is infeasible given it is a 236B param MOE.

jeffscott17 · June 21, 2024, 9:44pm

I’m running it locally in my terminal or some version of it with ollama? Idk just doing ollama run deepseek-coder-v2 and it runs (quite fast) on my M1 Mac. Good idea with openrouter! Thanks

arpagon · June 22, 2024, 7:12pm

It is easy to use ollama by changing the override OpenAI Base URL, but ideally, we should have a fourth provider called custom OpenAI compatible to be able to combine the local models with those of OpenAI, which is the closest to my workflow.

smcleod · June 25, 2024, 9:52pm

New (potential) Cursor user here ,

After installing Cursor and importing some of my most used VSCode plugins - the very first thing I went to change was to set Cursor to use either my Ollama or TabbyAPI LLM server.

I was quite surprised to see there weren’t native options for Ollama and the only OpenAI compatible option was to override the base URL which feels a bit all-or-nothing and doesn’t auto-populate the model list with you available models.

There’s really big advantages in being able to easily use local LLMs especially if you’re already running them for multiple other tasks:

They can be very fast
Have excellent domain specific knowledge
Be fine tuned and customised
Tooling augmented
A LOT cheaper to use if you’re a heavy user
You don’t get rate limited (which always happens at the worst time)
They work offline and with poor internet connections
Are privacy respecting (which with many clients I work with is a requirement)

For example, DeepSeek-Coder-V2 and Codestral are two models that are really fantastic, between those two I get better quality multi-shot code generation than I get from GPT4o more than 50% of the time.

In VSCode continue.dev and Tabby have pretty decent integration with both Ollama and OpenAI compatible API endpoints as first party citizens, but their extension features are not as nicely integrated into the IDE as Cursor.

By comparison when I added either my local Ollama OpenAI compatible API endpoint to Cursor and manually added the models I mostly use it seems Cursor just errors with:

Topic		Replies	Views
Add Support for Ollama natively Feature Requests	20	19918	July 23, 2025
I've messed up my Cursor Discussions	15	1491	February 19, 2024
Use custom API alongside cursor subscribed api Discussions	6	888	September 6, 2024
Using Local LLMs with Cursor: Is it Possible? Feature Requests	32	77612	September 10, 2025
Deepseek R1 self hosting Discussions	3	1643	January 28, 2025

Support local LLM's

Related topics