Trying to use local LLMs with cursor make LLMs hallucinating.
For using a local LLM i used Ollama in first, then LM Studio and exposed their APIs to internet over ngrok because Cursor isnt allowing to use local network for API endpoints.
Most of models arent calling tools correctly, when they does they for example
Write code in the wrong directory
Show responses to copy/paste instead of editing code (LLM will tell you it isnt capable to edit code while model did it earlier successfully)
Dont correctly read codes, telling code isnt existing because misreading blobs or idk what
it could be very cool to let people using their own models correctly on Cursor..
Steps to Reproduce
Use Ollama/LM Studio and expose their APIs to internet over ngrok then add custom OpenAI API URl and key
Hey, thanks for the report. This isn’t a Cursor bug, it’s the models. Cursor’s agent harness, system prompt, and tool calling format are tuned for frontier models like Claude, GPT-5, Gemini 3, etc. Small local models like Gemma 4 and Qwen 3.5 are generally weaker at instruction following and tool calling, so they might edit files in one turn and forget in the next, mix up paths, or paste text instead of doing an edit. That’s model behavior, not Cursor.
BYOK via the OpenAI base URL override works best-effort with any OpenAI-compatible endpoint, but we can’t guarantee quality with an arbitrary model. Agent features depend a lot on how well the model can follow complex system prompts and the tool calling protocol.
There are already open feature requests for native local model support and LAN access without ngrok. Feel free to add a vote or comment here:
If you want a more reliable agent experience through your own endpoint, try larger models with stronger tool calling support, like bigger Qwen3-Coder variants, DeepSeek, etc. Results are usually noticeably better than small Gemma or Qwen models.