Custom Models Set The Context Window to 1M

Where does the bug appear (feature/product)?

Cursor IDE

Describe the Bug

When adding custom models using an OpenAI-compatible API, Cursor sets the model’s context window to 1 million tokens by default without checking the response from the model backend.

This is a serious issue because most models on the market support up to 272K tokens, or even only 200K tokens.

Steps to Reproduce

  1. Add an OpenAI-compatible backend using a base URL and API key.
  2. Add a custom model name.
  3. Use the custom model in chat.
  4. Observe the context indicator.

Expected Behavior

Cursor should either allow users to customize the model’s context window, as other IDEs do, or set a safer default maximum, such as 272K tokens.

Operating System

MacOS

Version Information

Version: 3.3.22
VSCode Version: 1.105.1
Commit: 38a27120cfc7419a5efa38420665eaeeed1e7b30
Date: 2026-05-07T07:47:13.552Z
Layout: editor
Build Type: Stable
Release Track: Default
Electron: 39.8.1
Chromium: 142.0.7444.265
Node.js: 22.22.1
V8: 14.2.231.22-electron.0
OS: Darwin arm64 25.4.0

Does this stop you from using Cursor

Yes - Cursor is unusable

Hey, thanks for the detailed report.

What you’re seeing is actually by design. For custom OpenAI-compatible models that aren’t in our catalog, we set 1M as a safe upper bound so we don’t accidentally set the limit too low for a model that really supports a large context. We don’t currently have auto-detection of the context window from the provider, and there also isn’t a UI yet to configure the limit for a custom BYOK model.

So it’s not a bug, but the request to let users set the context window for custom models manually, or at least use a more conservative default, is valid. There’s already a similar feature request here: Unlock Full Context Window with Own API Keys It’s worth commenting there to increase visibility.

I can’t give an ETA for when this option will be available. If there are updates, We’ll follow up.

Thanks for the answer @deanrie

This opens up another issue: when using a model that does not actually support a 1M-token context window, auto-compaction fails and the chat stops automatically.

As a result, there is no practical way to use sub-agents and let Cursor continue working without manual intervention. The only workaround is to switch to Auto Mode, wait for auto-compaction to complete, and then switch back to the custom model, which is very inconvenient.

Yeah, that downstream effect is a fair point. With the 1M default in place, the auto-compaction threshold is calculated based on 1M. So if a model only supports 200 to 272K, the upstream API hits its real limit before our compaction kicks in, and the chat stalls. The Auto-Mode round trip is basically the only user-side workaround right now.

It’s the same root cause as the original report. Without a way for custom OpenAI-compatible models to declare their actual context size, both the indicator and the compaction trigger are wrong. The best place to add this info is the same feature request thread: Unlock Full Context Window with Own API Keys. Calling out the auto-compaction failure helps with prioritization, since it turns this from a nice-to-have into a workflow blocker for sub-agents.

Thanks @deanrie

I suppose you won’t be able to provide an estimated timeline for this either, but could you at least resolve this issue [Bug] Images/vision completely broken with OpenAI BYOK + custom endpoint override (Unauthorized error) as soon as possible? We are encountering more bugs with each passing day, and we are considering bringing this up in social communities like Reddit because Cursor is slowly becoming an unstable product for us.

We are not asking for new features. Our last three tickets were major blockers, not feature requests, and we have been waiting for a fix from you for 19 days. We are unfortunately forced to use your product right now, so we urgently need you to find a solution. On top of everything, a new issue emerges every day, just like this one.

Please, at least fix either the GPT-5.5 BYOK not working or the BYOK image attachment problem.

farewell man..

Hi @deanrie and Cursor team,

I’m adding my voice to this thread because the context window issue is actively blocking my workflow — not as a theoretical edge case, but as a daily crash.

My setup

  • **OS:** Linux
  • **Local inference:** Ollama (http://localhost:11434/v1)
  • **Model:** Qwopus3.5-9B-Coder (GGUF, Q8_0)
  • **Hardware:** RTX 2070 Super 8GB VRAM / 16GB RAM (upgrading to RTX 3090 soon)
  • **Ollama num_ctx:** 32768 (verified via ollama ps)

The problem

Cursor assumes a **200K** (or **1M** for custom models) context window and packs conversation history, file context, tool definitions, and codebase indexing accordingly.

My local model physically supports **32K tokens**. When Cursor exceeds that limit:

  • Ollama logs: truncating input prompt, context limit hit — shifting
  • The model crashes, hangs, or returns garbage
  • Auto-compaction never triggers in time because Cursor calculates thresholds against 200K/1M, not the real backend limit

This is exactly what Mehmet_Baykar described in the original report — but for local Ollama users with 4K–32K windows, the failure is even more severe and happens much sooner.

What I need

A simple setting: **let users define the maximum context window per custom model** (e.g. 32768), so Cursor:

  1. Stops sending prompts larger than the backend can handle
  2. Triggers compaction **before** the model crashes
  3. Shows an accurate context indicator (X / 32K, not X / 1M)

This is not a nice-to-have. Without it, **local models via Ollama are effectively unusable in Cursor Agent/Chat**.

How other agents already solve this

Every major alternative lets you cap context explicitly:

Agent How they handle context window
**Continue** contextLength: 32768 in YAML config — used for pruning before send
**Kilo Code** limit.context: 32768 in config + ollamaNumCtx in UI — triggers compaction
**Zed** max_tokens: 32768 per model in settings — sent as num_ctx to Ollama
**Roo Code** modelContextWindow override in provider settings
**Cline** Context Window Size field in UI + respects Ollama num_ctx (v3.17.9+)

Cursor is the **only** tool in this list that hardcodes 200K/1M and ignores the backend’s real limit.

Why this matters for paying customers

I pay for Cursor Pro. I want to use Cursor as my IDE — but I also want to run local models for privacy, cost, and offline work. Right now I cannot do both reliably.

The workaround (switch to Auto Mode, start new chats constantly, avoid codebase-wide context) is not a solution. It’s a workaround for a missing basic feature that competitors shipped years ago.

This thread has been open since May 8, marked as a bug report, with a team response saying it’s by design and pointing to a feature request with no ETA. The last reply was 20 days ago. Meanwhile, users are leaving — as Tom_Coustols noted in his last message here.

Final note

I am not asking for 1M context on local models. I am asking for the **opposite** — the ability to **lower** the assumed window to match my hardware (32K, 16K, 4K).

If this does not get prioritized and shipped, **I will cancel my Cursor subscription and move to Kilo Code / Continue**. I will not file the same report a second time. One thread, one chance — please treat this as a retention issue, not a feature wishlist item.

Thank you.

Hey, thanks for the detailed write-up. You described the local model case exactly right. You’re right that the problem is basically the mirror image of the usual one. You don’t need to raise the context window, you need to lower the assumed window to match what your hardware can actually handle, and the current default of 1M for custom OpenAI-compatible models outside our catalog doesn’t allow that.

I can confirm the current behavior. For those models we set 1M as a safe upper bound, there’s no auto-detect of the real context window from the provider, and there’s also no UI yet to set the context window manually for a custom model. That’s why auto-compaction triggers against the wrong limit. For a local model with a 32K window, the prompt hits the real limit before compaction has time to kick in.

This is a known limitation. It’s not a bug in the sense of a regression, it’s a missing capability, and I can’t share an ETA for when the setting will be available. The most helpful thing you can do right now is add your case (local Ollama, num_ctx 32K, the “need to lower the window, not raise it” scenario) to the relevant feature request: Unlock Full Context Window with Own API Keys. Specific setups like yours, plus a breakdown of how Continue, Zed, and Cline handle this, really helps us prioritize.

I get that for a local workflow this is a blocker, not just a wishlist item. I’ll note it that way.