How can I use a local LLM on my desktop/AI computer?

I have read a lot of questions about OLLAMA once being able to be used for local (cheap and faster) LLM’s in cursor. But it appears they have disabled the function to do so.

Does anyone have any idea how we can fix this?

Hey, thanks for the question.

Right now, Cursor doesn’t support direct connections to local models like Ollama running on localhost. The “Override OpenAI Base URL” option needs a publicly accessible HTTPS endpoint because all requests go through Cursor’s servers to build prompts.

There’s a workaround. You can use tunneling, like ngrok or Cloudflare Tunnel, to expose your local Ollama instance as a public HTTPS endpoint. Then use that URL in Cursor: Settings > Models > Override OpenAI Base URL.

Related discussion: Setup Ollama (local model) in Cursor

The team is aware of requests for native support for local models without tunneling.

Because of the cost of API access, many users need to argument with local LLMs. Please expand these capabilities, cursor might have a future also support local LLM models if API costs are too high.

You could provide two options: one that routes through Cursor’s servers, and another that bypasses them entirely. As an added bonus, this would also save on your bandwidth costs.

Hey, thanks for the feedback. A full bypass of Cursor servers is currently an architectural limitation, not a bandwidth-saving choice. Prompt building, context retrieval, and Cursor Tab and Agent run on our side, so even with a custom endpoint the request still goes through our backend.

Native local model support without tunneling is something we’re aware users want. We don’t have a concrete timeline for it yet.

For now, the working approach is the same: expose your local Ollama via ngrok or Cloudflare Tunnel and set the public HTTPS URL in Settings > Models > Override OpenAI Base URL.

If I configure a custom model whose name happens to overlap with a built-in Cursor model (for example, if I add a custom endpoint for gpt-5.5 and Cursor also offers gpt-5.5 natively), how does the system handle this conflict?

Why isn’t there a clear mechanism or UI distinction to separate built-in models from user-defined ones to prevent this routing ambiguity?

Most importantly: when I make an API request under this configuration, is it ultimately accessing my custom gpt-5.5 endpoint, or Cursor’s official gpt-5.5?

Good question. Routing here is decided by the API key, not the model name.

Here’s how it works: if you set your own OpenAI API key plus Override OpenAI Base URL, and that key is different from Cursor’s internal keys, then requests for OpenAI-family models (anything that’s not claude-* and not gemini-*) go to your custom endpoint. The model name (gpt-5.5 in your example) is just forwarded as-is. It’s not used as a signal to pick the route.

So to answer the main question: with BYOK + an override URL set up, you’re calling your own gpt-5.5 endpoint, not Cursor’s native one. There isn’t really a conflict. The same slot is filled by either your key or ours, depending on your settings.

On the UI split between built-in and custom models, that’s a fair point. I’ll pass it on as feedback. Right now there’s no clear visual label, and from settings it’s not always obvious where the request will go.

Small caveat: with a custom endpoint on /chat/completions, there’s a known issue right now with the payload format in some scenarios, but that’s not about routing itself.

You can take a look at your competitors; this is how they do it. They clearly differentiate between built-in models and custom models. The model selection is divided into two sections: built-in models at the top, and custom models at the bottom.

Got it, thanks for the specific example, it makes things clearer. I agree that splitting it into sections (built-in at the top, custom at the bottom) removes ambiguity when model names overlap.

I’ll pass the feedback to the team. I can’t share a specific timeline or make any promises about UI changes, but the idea is clear and reasonable.