Local LLM usage

Hey, thanks for the report. This isn’t a Cursor bug, it’s the models. Cursor’s agent harness, system prompt, and tool calling format are tuned for frontier models like Claude, GPT-5, Gemini 3, etc. Small local models like Gemma 4 and Qwen 3.5 are generally weaker at instruction following and tool calling, so they might edit files in one turn and forget in the next, mix up paths, or paste text instead of doing an edit. That’s model behavior, not Cursor.

BYOK via the OpenAI base URL override works best-effort with any OpenAI-compatible endpoint, but we can’t guarantee quality with an arbitrary model. Agent features depend a lot on how well the model can follow complex system prompts and the tool calling protocol.

There are already open feature requests for native local model support and LAN access without ngrok. Feel free to add a vote or comment here:

If you want a more reliable agent experience through your own endpoint, try larger models with stronger tool calling support, like bigger Qwen3-Coder variants, DeepSeek, etc. Results are usually noticeably better than small Gemma or Qwen models.