Hi @deanrie and Cursor team,
I’m adding my voice to this thread because the context window issue is actively blocking my workflow — not as a theoretical edge case, but as a daily crash.
My setup
- **OS:** Linux
- **Local inference:** Ollama (http://localhost:11434/v1)
- **Model:** Qwopus3.5-9B-Coder (GGUF, Q8_0)
- **Hardware:** RTX 2070 Super 8GB VRAM / 16GB RAM (upgrading to RTX 3090 soon)
- **Ollama num_ctx:** 32768 (verified via ollama ps)
The problem
Cursor assumes a **200K** (or **1M** for custom models) context window and packs conversation history, file context, tool definitions, and codebase indexing accordingly.
My local model physically supports **32K tokens**. When Cursor exceeds that limit:
- Ollama logs: truncating input prompt, context limit hit — shifting
- The model crashes, hangs, or returns garbage
- Auto-compaction never triggers in time because Cursor calculates thresholds against 200K/1M, not the real backend limit
This is exactly what Mehmet_Baykar described in the original report — but for local Ollama users with 4K–32K windows, the failure is even more severe and happens much sooner.
What I need
A simple setting: **let users define the maximum context window per custom model** (e.g. 32768), so Cursor:
- Stops sending prompts larger than the backend can handle
- Triggers compaction **before** the model crashes
- Shows an accurate context indicator (X / 32K, not X / 1M)
This is not a nice-to-have. Without it, **local models via Ollama are effectively unusable in Cursor Agent/Chat**.
How other agents already solve this
Every major alternative lets you cap context explicitly:
| Agent |
How they handle context window |
| **Continue** |
contextLength: 32768 in YAML config — used for pruning before send |
| **Kilo Code** |
limit.context: 32768 in config + ollamaNumCtx in UI — triggers compaction |
| **Zed** |
max_tokens: 32768 per model in settings — sent as num_ctx to Ollama |
| **Roo Code** |
modelContextWindow override in provider settings |
| **Cline** |
Context Window Size field in UI + respects Ollama num_ctx (v3.17.9+) |
Cursor is the **only** tool in this list that hardcodes 200K/1M and ignores the backend’s real limit.
Why this matters for paying customers
I pay for Cursor Pro. I want to use Cursor as my IDE — but I also want to run local models for privacy, cost, and offline work. Right now I cannot do both reliably.
The workaround (switch to Auto Mode, start new chats constantly, avoid codebase-wide context) is not a solution. It’s a workaround for a missing basic feature that competitors shipped years ago.
This thread has been open since May 8, marked as a bug report, with a team response saying it’s by design and pointing to a feature request with no ETA. The last reply was 20 days ago. Meanwhile, users are leaving — as Tom_Coustols noted in his last message here.
Final note
I am not asking for 1M context on local models. I am asking for the **opposite** — the ability to **lower** the assumed window to match my hardware (32K, 16K, 4K).
If this does not get prioritized and shipped, **I will cancel my Cursor subscription and move to Kilo Code / Continue**. I will not file the same report a second time. One thread, one chance — please treat this as a retention issue, not a feature wishlist item.
Thank you.