I’ve been exploring the idea of using a locally installed Large Language Model (LLM) with Cursor instead of relying on cloud-based services. I’m particularly interested in using a Llama LLM for coding in the future. Has anyone else considered this or know if it’s feasible?
Current Situation
From what I understand, Cursor is designed to work with cloud-based AI services, specifically OpenAI’s API and Claude AI. This means we’re currently limited to using these cloud-based LLMs.
Potential Benefits of Local LLMs (like Llama)
Privacy: Keep sensitive code data local
Customization: Potentially fine-tune models for specific coding styles or domains
Offline Use: Work without an internet connection
Cost: Possibly reduce long-term costs for heavy users
Flexibility: Use open-source models like Llama that can be adapted for specific needs
Questions for the Community
Is there any way to integrate a local LLM (such as Llama) with Cursor currently?
If not, is this a feature the Cursor team has considered implementing?
What challenges might prevent this integration?
Would you be interested in using a local LLM with Cursor if it were possible?
Has anyone experimented with Llama or other open-source LLMs for coding tasks? What was your experience?
I’m planning to install a Llama LLM for coding in the future, and I’m curious if others have similar plans or experiences. If anyone from the Cursor development team is reading, we’d love to get your perspective on potentially supporting local LLMs alongside the current cloud-based options.
I have use custom api with cursor. You can use any api that use openAi api schemes, including local ones. Small problem though, cursor do not recognize localhost address. I’ve circumvent this with tunnelling the local address onto a cloud tunnel like using ngrok, then put the link that ngrok give to the custom api endpoint in cursor models settings.
Although I’m not using local llm directly, but I’m running litellm proxy server locally. I use this to reroute custom model to several llm provider such as togetherai, openrouter, infermatic, etc. I’m frequently using Llama 3.1 70B instruct and wizardLM2 8x22B MoE. It can used for cmd+K and chat, but unfortunately not for composer or tab completion afaik
Thanks for providing this information, I’m just trying to add our LLM(llama3-70b) located in our on-prem server through custom api with cursor. I can add this model in model list of cursor by specifying openai base url but fail to talk and receive : "the model does not work with your current plan or api key " , which makes me so confused. So it seems that I need to access it by ngrok also?
Thanks for sharing, and I have a follow up question to @deanrie and Cursor team: if we use customize LLM with Cursor, do we still need to acquire account/payment to Cursor? It looks like Cursor is the one to request the LLMs.
To use the API key, you don’t need a PRO subscription, but you do need an account with the free plan. You can enter your API keys in the Cursor settings and use it. However, in this case, you’ll only have access to the main chat and the inline chat (cmd+k), and you won’t be able to use Composer and Cursor Tab.
The actual call to the LLM is not from cursor app, but from the server (this is my guess, because for each query it calls the cursor server). There might be some RAG optimization or other prompt engineering happening that is only happening there? Or perhaps result parsing to accommodate git-like change visuals?
I think you’re on the right path. I watched an interview with the founders and they mentioned how codebase indexing is something that happens on their side, and is stored in their database. I also think they do some magic with the prompt, call caching, etc. that’s happening on their side, not in your local application. While not there, yet, they’ve also said something about using methods to index and transfer only encrypted data, but that doesn’t seem to be how it’s done, now.
So using Cursor with local LLMs diminishes the main points of using local LLMs?
I don’t think offline use works with cursor. But you can still use local LLM with online use, so that cursor still use their API for indexing, apply, etc, but use your local LLM for the main inference.
But this is not because of you want to keep things offline, rather you want to use your own model. For example: let say I have a model that’s extensively trained in a modding language for Crusader Kings 3, I want to use this model rather than public gpt/claude because I have fully finetuned it to my usecase. (point 2/5) or you have a cheaper model that’s faster like I’m using infermatic often (point 4).
But yes, point 3 is impossible, while point 1 we need to trust cursor with their data policy