Local LLM usage

Where does the bug appear (feature/product)?

Cursor IDE

Describe the Bug

Trying to use local LLMs with cursor make LLMs hallucinating.

For using a local LLM i used Ollama in first, then LM Studio and exposed their APIs to internet over ngrok because Cursor isnt allowing to use local network for API endpoints.

Most of models arent calling tools correctly, when they does they for example

  • Write code in the wrong directory
  • Show responses to copy/paste instead of editing code (LLM will tell you it isnt capable to edit code while model did it earlier successfully)
  • Dont correctly read codes, telling code isnt existing because misreading blobs or idk what

it could be very cool to let people using their own models correctly on Cursor..

Steps to Reproduce

Use Ollama/LM Studio and expose their APIs to internet over ngrok then add custom OpenAI API URl and key

Operating System

Windows 10/11

Version Information

Cursor IDE 3.3.28

For AI issues: which model did you use?

Gemma 4, Qwen 3.5

Does this stop you from using Cursor

No - Cursor works, but with this issue

Hey, thanks for the report. This isn’t a Cursor bug, it’s the models. Cursor’s agent harness, system prompt, and tool calling format are tuned for frontier models like Claude, GPT-5, Gemini 3, etc. Small local models like Gemma 4 and Qwen 3.5 are generally weaker at instruction following and tool calling, so they might edit files in one turn and forget in the next, mix up paths, or paste text instead of doing an edit. That’s model behavior, not Cursor.

BYOK via the OpenAI base URL override works best-effort with any OpenAI-compatible endpoint, but we can’t guarantee quality with an arbitrary model. Agent features depend a lot on how well the model can follow complex system prompts and the tool calling protocol.

There are already open feature requests for native local model support and LAN access without ngrok. Feel free to add a vote or comment here:

If you want a more reliable agent experience through your own endpoint, try larger models with stronger tool calling support, like bigger Qwen3-Coder variants, DeepSeek, etc. Results are usually noticeably better than small Gemma or Qwen models.