Support local LLM's

for some reason the gpt3.5 plus debugger, is a winner in this benchmark across all gpt4 variations HumanEval Benchmark (Code Generation) | Papers With Code what are all the gpt4 variations in that list and where is Claude at all? I don’t understand. so gpt3.5 “reasoning and intellect” capability is absolutely “good enough” for any coding tasks in reality, the only thing it needs is either a larger context window or a ‘feedback loop’ (which I guess is what they did to score 1st place in benchmark?).

it’s pretty sick to see a gpt3.5 topping any benchmark chart. gonna read more about it. but looks like we’re wasting our time hunting for ‘best and better’ models like Claude, it’s not where the secret is.

look where Claude is on the benchmark :man_shrugging: , the opensource ‘starcoder’ (octocoder, specially tuned starcoder in that table) beats it (5th place, vs Claude on 9th place), so why Claude is all the hype :thinking: . I’ll try to find that OctoCoder on openrouter, maybe can connect to Cursor, seems like best option.

Which model was best for you guys so far?

Tried the same for Claude 3, both with api key starting with ‘sk-or-v1’ or only the hash that comes after that, both didn’t work.

Currently does not work for me unfortunately, would be a great addition!

Hi, have you added a model in this format: anthropic/claude-3-opus?
I didn’t succeed the first time, let me check it tomorrow and I’ll give you the answer

I’m currently experimenting with a local DeepSeek-coder model (which also can run as the backend for Cursor, because in LM Studio there’s webserver option to make it listen on localhost) and it’s surprisingly random, sometimes it nails the answers based on a chunk of docs which I paste to its context (about 3k tokens of docs) and sometimes it misses and starts giving me things I didn’t ask for o_O … playing with top_p, temp, parameters and hoping that a fine-tuned version of this model will be better for me than generic gpt4 or 3.5 (because I deal with unknown new language “Verse” for unreal engine, and those models still don’t know anything about it, not included in their data sets). Although it’s definitely possible to work in Cursor with it and gpt3.5, but I want a 100% precision to any question asked in the chat, vs a 50/50 precision that I get now :smile: , it should be possible with a local model :crossed_fingers: wish me luck.

1 Like

Yes, I’ve also tried to add chat/completions to the URL.
Screenshot 2024-03-09 at 23.42.52

You must remove the last slash from the url

1 Like

I have the exact setup in your screenshot but it isn’t working for me.
Is there anything else you’re doing to get it working?

1 Like

Try restarting the Cursor, for some reason for many people it starts working the next day for unknown reasons :joy:

2 Likes

Configuring Cursor with Openrouter.ai isn’t entirely straightforward. I’ve prepared a step-by-step guide:

3 Likes

thanks boss!

1 Like

With the recent update, you no longer need to use OpenRouter.ai or override the OpenAI Base URL.

Just add “claude-3-opus” as a custom model in the Settings page, and you’re ready to go:

  1. Open Cursor.
  2. Hit Ctrl+Shift+J to enter Cursor settings.
  3. Scroll to “OpenAI API”.
  4. Click on “Configure models” below the API key field.
  5. Click on “+ Add model” (below the model names).
  6. Enter “claude-3-opus” as the model name.
  7. Click on the “+” next to the field to add the model.
  8. Close the settings page.
  9. Use Cursor as before with the new Claude 3 Opus model.

Note: You have to add “just give me the code, nothing else, no explanation, just the code” to every prompt to make it work.

3 Likes

Hi, can you show how to use a local model from lm studio?

Shouldn’t it work like shown above with local URL like

127.0.0.1/
localhost/

and maybe port number, some more path?

no, we make requests from the server. we cant call your local computer.

1 Like

Given that DeepSeek Coder V2 is now beating GPT4o can we reopen this issue and get LLM’s in here now? Seeing as the cited excuse was that no other OSS models were comparable yet?

3 Likes

Deepseek v2 is possible with openrouter as a base url: Reddit - Dive into anything.

We also plan to support it after confirming performance on internal evals and getting the model setup with our own inference provider.

But running it locally is infeasible given it is a 236B param MOE.

2 Likes

I’m running it locally in my terminal or some version of it with ollama? Idk just doing ollama run deepseek-coder-v2 and it runs (quite fast) on my M1 Mac. Good idea with openrouter! Thanks

1 Like

It is easy to use ollama by changing the override OpenAI Base URL, but ideally, we should have a fourth provider called custom OpenAI compatible to be able to combine the local models with those of OpenAI, which is the closest to my workflow.

New (potential) Cursor user here :wave: ,

After installing Cursor and importing some of my most used VSCode plugins - the very first thing I went to change was to set Cursor to use either my Ollama or TabbyAPI LLM server.

I was quite surprised to see there weren’t native options for Ollama and the only OpenAI compatible option was to override the base URL which feels a bit all-or-nothing and doesn’t auto-populate the model list with you available models.

There’s really big advantages in being able to easily use local LLMs especially if you’re already running them for multiple other tasks:

  • They can be very fast
  • Have excellent domain specific knowledge
  • Be fine tuned and customised
  • Tooling augmented
  • A LOT cheaper to use if you’re a heavy user
  • You don’t get rate limited (which always happens at the worst time)
  • They work offline and with poor internet connections
  • Are privacy respecting (which with many clients I work with is a requirement)

For example, DeepSeek-Coder-V2 and Codestral are two models that are really fantastic, between those two I get better quality multi-shot code generation than I get from GPT4o more than 50% of the time.

In VSCode continue.dev and Tabby have pretty decent integration with both Ollama and OpenAI compatible API endpoints as first party citizens, but their extension features are not as nicely integrated into the IDE as Cursor.

By comparison when I added either my local Ollama OpenAI compatible API endpoint to Cursor and manually added the models I mostly use it seems Cursor just errors with:

5 Likes