Using Local LLMs with Cursor: Is it Possible?

Using Local LLMs with Cursor: Is it Possible?

Hello Cursor community!

I’ve been exploring the idea of using a locally installed Large Language Model (LLM) with Cursor instead of relying on cloud-based services. I’m particularly interested in using a Llama LLM for coding in the future. Has anyone else considered this or know if it’s feasible?

Current Situation

From what I understand, Cursor is designed to work with cloud-based AI services, specifically OpenAI’s API and Claude AI. This means we’re currently limited to using these cloud-based LLMs.

Potential Benefits of Local LLMs (like Llama)

  1. Privacy: Keep sensitive code data local
  2. Customization: Potentially fine-tune models for specific coding styles or domains
  3. Offline Use: Work without an internet connection
  4. Cost: Possibly reduce long-term costs for heavy users
  5. Flexibility: Use open-source models like Llama that can be adapted for specific needs

Questions for the Community

  1. Is there any way to integrate a local LLM (such as Llama) with Cursor currently?
  2. If not, is this a feature the Cursor team has considered implementing?
  3. What challenges might prevent this integration?
  4. Would you be interested in using a local LLM with Cursor if it were possible?
  5. Has anyone experimented with Llama or other open-source LLMs for coding tasks? What was your experience?

I’m planning to install a Llama LLM for coding in the future, and I’m curious if others have similar plans or experiences. If anyone from the Cursor development team is reading, we’d love to get your perspective on potentially supporting local LLMs alongside the current cloud-based options.

Let’s discuss!

3 Likes

Hi @calmhaycool,

You can try this https://www.cursorlens.com/
But I’m not sure how well it works now.

4 Likes

I have use custom api with cursor. You can use any api that use openAi api schemes, including local ones. Small problem though, cursor do not recognize localhost address. I’ve circumvent this with tunnelling the local address onto a cloud tunnel like using ngrok, then put the link that ngrok give to the custom api endpoint in cursor models settings.

Although I’m not using local llm directly, but I’m running litellm proxy server locally. I use this to reroute custom model to several llm provider such as togetherai, openrouter, infermatic, etc. I’m frequently using Llama 3.1 70B instruct and wizardLM2 8x22B MoE. It can used for cmd+K and chat, but unfortunately not for composer or tab completion afaik

2 Likes

Thanks for sharing it. I will give it a try. It could be a great backup with Llama 3.1 70B, when the internet is off.

Thanks for providing this information, I’m just trying to add our LLM(llama3-70b) located in our on-prem server through custom api with cursor. I can add this model in model list of cursor by specifying openai base url but fail to talk and receive : "the model does not work with your current plan or api key " , which makes me so confused. So it seems that I need to access it by ngrok also?

Have you done it successfully?

Yes, it can be used

  1. input the model name correctly (case sensitive & whitespace sensitive)
  2. enable OpenAi custom api
  3. input override OpenAi base url with your server (it needs to be publicly accessible)
  4. Change the model name in cmd + K or chat

Thanks for sharing, and I have a follow up question to @deanrie and Cursor team: if we use customize LLM with Cursor, do we still need to acquire account/payment to Cursor? It looks like Cursor is the one to request the LLMs.

To use the API key, you don’t need a PRO subscription, but you do need an account with the free plan. You can enter your API keys in the Cursor settings and use it. However, in this case, you’ll only have access to the main chat and the inline chat (cmd+k), and you won’t be able to use Composer and Cursor Tab.

1 Like

Thanks! This answers my question! We will evaluate more!

1 Like

I’m really trying to understand why the privately hosted LLM needs to be publicly accessible for this to work, can anyone share the reasons for that?