Local LLM usage

anoncoder555 · May 8, 2026, 6:52pm

Where does the bug appear (feature/product)?

Cursor IDE

Describe the Bug

Trying to use local LLMs with cursor make LLMs hallucinating.

For using a local LLM i used Ollama in first, then LM Studio and exposed their APIs to internet over ngrok because Cursor isnt allowing to use local network for API endpoints.

Most of models arent calling tools correctly, when they does they for example

Write code in the wrong directory
Show responses to copy/paste instead of editing code (LLM will tell you it isnt capable to edit code while model did it earlier successfully)
Dont correctly read codes, telling code isnt existing because misreading blobs or idk what

it could be very cool to let people using their own models correctly on Cursor..

Steps to Reproduce

Use Ollama/LM Studio and expose their APIs to internet over ngrok then add custom OpenAI API URl and key

Operating System

Windows 10/11

Version Information

Cursor IDE 3.3.28

For AI issues: which model did you use?

Gemma 4, Qwen 3.5

Does this stop you from using Cursor

No - Cursor works, but with this issue

deanrie · May 8, 2026, 7:06pm

Hey, thanks for the report. This isn’t a Cursor bug, it’s the models. Cursor’s agent harness, system prompt, and tool calling format are tuned for frontier models like Claude, GPT-5, Gemini 3, etc. Small local models like Gemma 4 and Qwen 3.5 are generally weaker at instruction following and tool calling, so they might edit files in one turn and forget in the next, mix up paths, or paste text instead of doing an edit. That’s model behavior, not Cursor.

BYOK via the OpenAI base URL override works best-effort with any OpenAI-compatible endpoint, but we can’t guarantee quality with an arbitrary model. Agent features depend a lot on how well the model can follow complex system prompts and the tool calling protocol.

There are already open feature requests for native local model support and LAN access without ngrok. Feel free to add a vote or comment here:

If you want a more reliable agent experience through your own endpoint, try larger models with stronger tool calling support, like bigger Qwen3-Coder variants, DeepSeek, etc. Results are usually noticeably better than small Gemma or Qwen models.

Topic		Replies	Views
How can I use a local LLM on my desktop/AI computer? Help byok	8	2088	May 26, 2026
Cursor blocking calls to local models Feature Requests	3	377	September 7, 2025
Run a local LLM model with cursor? Help byok	1	1365	April 2, 2026
Using Local LLMs with Cursor: Is it Possible? Discussions	32	95249	September 10, 2025
Trying to use local models, getting an error Bug Reports	2	464	August 18, 2025