Using Local LLMs with Cursor: Is it Possible?

Using Local LLMs with Cursor: Is it Possible?

Hello Cursor community!

I’ve been exploring the idea of using a locally installed Large Language Model (LLM) with Cursor instead of relying on cloud-based services. I’m particularly interested in using a Llama LLM for coding in the future. Has anyone else considered this or know if it’s feasible?

Current Situation

From what I understand, Cursor is designed to work with cloud-based AI services, specifically OpenAI’s API and Claude AI. This means we’re currently limited to using these cloud-based LLMs.

Potential Benefits of Local LLMs (like Llama)

  1. Privacy: Keep sensitive code data local
  2. Customization: Potentially fine-tune models for specific coding styles or domains
  3. Offline Use: Work without an internet connection
  4. Cost: Possibly reduce long-term costs for heavy users
  5. Flexibility: Use open-source models like Llama that can be adapted for specific needs

Questions for the Community

  1. Is there any way to integrate a local LLM (such as Llama) with Cursor currently?
  2. If not, is this a feature the Cursor team has considered implementing?
  3. What challenges might prevent this integration?
  4. Would you be interested in using a local LLM with Cursor if it were possible?
  5. Has anyone experimented with Llama or other open-source LLMs for coding tasks? What was your experience?

I’m planning to install a Llama LLM for coding in the future, and I’m curious if others have similar plans or experiences. If anyone from the Cursor development team is reading, we’d love to get your perspective on potentially supporting local LLMs alongside the current cloud-based options.

Let’s discuss!

10 Likes

Hi @calmhaycool,

You can try this https://www.cursorlens.com/
But I’m not sure how well it works now.

4 Likes

I have use custom api with cursor. You can use any api that use openAi api schemes, including local ones. Small problem though, cursor do not recognize localhost address. I’ve circumvent this with tunnelling the local address onto a cloud tunnel like using ngrok, then put the link that ngrok give to the custom api endpoint in cursor models settings.

Although I’m not using local llm directly, but I’m running litellm proxy server locally. I use this to reroute custom model to several llm provider such as togetherai, openrouter, infermatic, etc. I’m frequently using Llama 3.1 70B instruct and wizardLM2 8x22B MoE. It can used for cmd+K and chat, but unfortunately not for composer or tab completion afaik

2 Likes

Thanks for sharing it. I will give it a try. It could be a great backup with Llama 3.1 70B, when the internet is off.

1 Like

Thanks for providing this information, I’m just trying to add our LLM(llama3-70b) located in our on-prem server through custom api with cursor. I can add this model in model list of cursor by specifying openai base url but fail to talk and receive : "the model does not work with your current plan or api key " , which makes me so confused. So it seems that I need to access it by ngrok also?

Have you done it successfully?

Yes, it can be used

  1. input the model name correctly (case sensitive & whitespace sensitive)
  2. enable OpenAi custom api
  3. input override OpenAi base url with your server (it needs to be publicly accessible)
  4. Change the model name in cmd + K or chat
1 Like

Thanks for sharing, and I have a follow up question to @deanrie and Cursor team: if we use customize LLM with Cursor, do we still need to acquire account/payment to Cursor? It looks like Cursor is the one to request the LLMs.

To use the API key, you don’t need a PRO subscription, but you do need an account with the free plan. You can enter your API keys in the Cursor settings and use it. However, in this case, you’ll only have access to the main chat and the inline chat (cmd+k), and you won’t be able to use Composer and Cursor Tab.

3 Likes

Thanks! This answers my question! We will evaluate more!

1 Like

I’m really trying to understand why the privately hosted LLM needs to be publicly accessible for this to work, can anyone share the reasons for that?

The actual call to the LLM is not from cursor app, but from the server (this is my guess, because for each query it calls the cursor server). There might be some RAG optimization or other prompt engineering happening that is only happening there? Or perhaps result parsing to accommodate git-like change visuals?

1 Like

I think you’re on the right path. I watched an interview with the founders and they mentioned how codebase indexing is something that happens on their side, and is stored in their database. I also think they do some magic with the prompt, call caching, etc. that’s happening on their side, not in your local application. While not there, yet, they’ve also said something about using methods to index and transfer only encrypted data, but that doesn’t seem to be how it’s done, now.

So using Cursor with local LLMs diminishes the main points of using local LLMs?

So using Cursor with local LLMs diminishes the main points of using local LLMs?

I don’t think offline use works with cursor. But you can still use local LLM with online use, so that cursor still use their API for indexing, apply, etc, but use your local LLM for the main inference.

But this is not because of you want to keep things offline, rather you want to use your own model. For example: let say I have a model that’s extensively trained in a modding language for Crusader Kings 3, I want to use this model rather than public gpt/claude because I have fully finetuned it to my usecase. (point 2/5) or you have a cheaper model that’s faster like I’m using infermatic often (point 4).

But yes, point 3 is impossible, while point 1 we need to trust cursor with their data policy :man_shrugging:

I don’t see any good reason why this should not be doable, or made doable.

Any RAG or prompt optimization can be made client-side and the cache either made local or cut entirely.

We should be able to use this on international flights, which is not possible right now.

4 Likes

I would love to see the native features that Cursor does so well expand to local LLMs. It would be a great addition to the Pro plan.

2 Likes

yeah same… getting on a flight soon and would love to use Cursor w/ a local coding model

would be cool if it at least worked w/ cmd+k inference; don’t need the agent stuff for now

Probably because it’s an easy way to generate synthetic data on indices and train off those.

I doubt there’s any other reason to keep it server side tbh

That would imply they are using our data or metadata without recompensing us, which is not great.