Custom Models Set The Context Window to 1M

Mehmet_Baykar · May 8, 2026, 2:39pm

Where does the bug appear (feature/product)?

Cursor IDE

Describe the Bug

When adding custom models using an OpenAI-compatible API, Cursor sets the model’s context window to 1 million tokens by default without checking the response from the model backend.

This is a serious issue because most models on the market support up to 272K tokens, or even only 200K tokens.

Steps to Reproduce

Add an OpenAI-compatible backend using a base URL and API key.
Add a custom model name.
Use the custom model in chat.
Observe the context indicator.

Expected Behavior

Cursor should either allow users to customize the model’s context window, as other IDEs do, or set a safer default maximum, such as 272K tokens.

Operating System

MacOS

Version Information

Version: 3.3.22
VSCode Version: 1.105.1
Commit: 38a27120cfc7419a5efa38420665eaeeed1e7b30
Date: 2026-05-07T07:47:13.552Z
Layout: editor
Build Type: Stable
Release Track: Default
Electron: 39.8.1
Chromium: 142.0.7444.265
Node.js: 22.22.1
V8: 14.2.231.22-electron.0
OS: Darwin arm64 25.4.0

Does this stop you from using Cursor

Yes - Cursor is unusable

deanrie · May 8, 2026, 3:15pm

Hey, thanks for the detailed report.

What you’re seeing is actually by design. For custom OpenAI-compatible models that aren’t in our catalog, we set 1M as a safe upper bound so we don’t accidentally set the limit too low for a model that really supports a large context. We don’t currently have auto-detection of the context window from the provider, and there also isn’t a UI yet to configure the limit for a custom BYOK model.

So it’s not a bug, but the request to let users set the context window for custom models manually, or at least use a more conservative default, is valid. There’s already a similar feature request here: Unlock Full Context Window with Own API Keys It’s worth commenting there to increase visibility.

I can’t give an ETA for when this option will be available. If there are updates, We’ll follow up.

Mehmet_Baykar · May 8, 2026, 3:20pm

Thanks for the answer @deanrie

This opens up another issue: when using a model that does not actually support a 1M-token context window, auto-compaction fails and the chat stops automatically.

As a result, there is no practical way to use sub-agents and let Cursor continue working without manual intervention. The only workaround is to switch to Auto Mode, wait for auto-compaction to complete, and then switch back to the custom model, which is very inconvenient.

deanrie · May 8, 2026, 3:41pm

Yeah, that downstream effect is a fair point. With the 1M default in place, the auto-compaction threshold is calculated based on 1M. So if a model only supports 200 to 272K, the upstream API hits its real limit before our compaction kicks in, and the chat stalls. The Auto-Mode round trip is basically the only user-side workaround right now.

It’s the same root cause as the original report. Without a way for custom OpenAI-compatible models to declare their actual context size, both the indicator and the compaction trigger are wrong. The best place to add this info is the same feature request thread: Unlock Full Context Window with Own API Keys. Calling out the auto-compaction failure helps with prioritization, since it turns this from a nice-to-have into a workflow blocker for sub-agents.

Mehmet_Baykar · May 8, 2026, 4:31pm

Thanks @deanrie

I suppose you won’t be able to provide an estimated timeline for this either, but could you at least resolve this issue [Bug] Images/vision completely broken with OpenAI BYOK + custom endpoint override (Unauthorized error) as soon as possible? We are encountering more bugs with each passing day, and we are considering bringing this up in social communities like Reddit because Cursor is slowly becoming an unstable product for us.

We are not asking for new features. Our last three tickets were major blockers, not feature requests, and we have been waiting for a fix from you for 19 days. We are unfortunately forced to use your product right now, so we urgently need you to find a solution. On top of everything, a new issue emerges every day, just like this one.

Please, at least fix either the GPT-5.5 BYOK not working or the BYOK image attachment problem.

Tom_Coustols · May 15, 2026, 11:04pm

farewell man..

PeterAI · June 5, 2026, 9:52am

Hi @deanrie and Cursor team,

I’m adding my voice to this thread because the context window issue is actively blocking my workflow — not as a theoretical edge case, but as a daily crash.

My setup

**OS:** Linux
**Local inference:** Ollama (http://localhost:11434/v1)
**Model:** Qwopus3.5-9B-Coder (GGUF, Q8_0)
**Hardware:** RTX 2070 Super 8GB VRAM / 16GB RAM (upgrading to RTX 3090 soon)
**Ollama num_ctx:** 32768 (verified via ollama ps)

The problem

Cursor assumes a **200K** (or **1M** for custom models) context window and packs conversation history, file context, tool definitions, and codebase indexing accordingly.

My local model physically supports **32K tokens**. When Cursor exceeds that limit:

Ollama logs: truncating input prompt, context limit hit — shifting
The model crashes, hangs, or returns garbage
Auto-compaction never triggers in time because Cursor calculates thresholds against 200K/1M, not the real backend limit

This is exactly what Mehmet_Baykar described in the original report — but for local Ollama users with 4K–32K windows, the failure is even more severe and happens much sooner.

What I need

A simple setting: **let users define the maximum context window per custom model** (e.g. 32768), so Cursor:

Stops sending prompts larger than the backend can handle
Triggers compaction **before** the model crashes
Shows an accurate context indicator (X / 32K, not X / 1M)

This is not a nice-to-have. Without it, **local models via Ollama are effectively unusable in Cursor Agent/Chat**.

How other agents already solve this

Every major alternative lets you cap context explicitly:

Agent	How they handle context window
Continue	contextLength: 32768 in YAML config — used for pruning before send
Kilo Code	limit.context: 32768 in config + ollamaNumCtx in UI — triggers compaction
Zed	max_tokens: 32768 per model in settings — sent as num_ctx to Ollama
Roo Code	modelContextWindow override in provider settings
Cline	Context Window Size field in UI + respects Ollama num_ctx (v3.17.9+)

Cursor is the **only** tool in this list that hardcodes 200K/1M and ignores the backend’s real limit.

Why this matters for paying customers

I pay for Cursor Pro. I want to use Cursor as my IDE — but I also want to run local models for privacy, cost, and offline work. Right now I cannot do both reliably.

The workaround (switch to Auto Mode, start new chats constantly, avoid codebase-wide context) is not a solution. It’s a workaround for a missing basic feature that competitors shipped years ago.

This thread has been open since May 8, marked as a bug report, with a team response saying it’s by design and pointing to a feature request with no ETA. The last reply was 20 days ago. Meanwhile, users are leaving — as Tom_Coustols noted in his last message here.

Final note

I am not asking for 1M context on local models. I am asking for the **opposite** — the ability to **lower** the assumed window to match my hardware (32K, 16K, 4K).

If this does not get prioritized and shipped, **I will cancel my Cursor subscription and move to Kilo Code / Continue**. I will not file the same report a second time. One thread, one chance — please treat this as a retention issue, not a feature wishlist item.

Thank you.

deanrie · June 5, 2026, 2:58pm

Hey, thanks for the detailed write-up. You described the local model case exactly right. You’re right that the problem is basically the mirror image of the usual one. You don’t need to raise the context window, you need to lower the assumed window to match what your hardware can actually handle, and the current default of 1M for custom OpenAI-compatible models outside our catalog doesn’t allow that.

I can confirm the current behavior. For those models we set 1M as a safe upper bound, there’s no auto-detect of the real context window from the provider, and there’s also no UI yet to set the context window manually for a custom model. That’s why auto-compaction triggers against the wrong limit. For a local model with a 32K window, the prompt hits the real limit before compaction has time to kick in.

This is a known limitation. It’s not a bug in the sense of a regression, it’s a missing capability, and I can’t share an ETA for when the setting will be available. The most helpful thing you can do right now is add your case (local Ollama, num_ctx 32K, the “need to lower the window, not raise it” scenario) to the relevant feature request: Unlock Full Context Window with Own API Keys. Specific setups like yours, plus a breakdown of how Continue, Zed, and Cline handle this, really helps us prioritize.

I get that for a local workflow this is a blocker, not just a wishlist item. I’ll note it that way.

aldinokemal · June 16, 2026, 3:55pm

Hi Dean, its been almost 2 weeks,

do you have any idea on how we can set context to 1M?

deanrie · June 16, 2026, 4:52pm

Hey, as of right now nothing has changed. There’s still no separate setting to manually set the context window for a custom OpenAI-compatible model, and there’s also no auto-detection of the real limit from the provider. I can’t share an ETA.

Quick note for your case: for custom models that aren’t in our catalog, the default is 1M. If you’re seeing less (like 200K), it’s most likely because the model name matches a catalog entry that’s capped lower. Can you tell me which model you’re using and what exact name you’re registering it under? I’ll take a look at what’s happening. As a workaround, you can try registering it under a name that isn’t in the catalog, then the default 1M should apply.

The most helpful thing right now is to add your case (model, provider’s real limit, and that you specifically need 1M) to this feature request: Unlock Full Context Window with Own API Keys This helps with prioritization.

aldinokemal · June 16, 2026, 5:18pm

Im using stealth model (name is pretty unique) but it restricted to 200k

let say model name is ‘awesome-model’

anuj · July 8, 2026, 1:30am

I hope it will be fixed soon

anuj · July 8, 2026, 1:33am

I’ve already tried it as the my model name differs from the one in catalogue but my model with 1M context still get allotted a 200k context limit. I’m using GLM 5.2.

anuj · July 8, 2026, 1:39am

Can you please tell me your experience with Zed or any other better alternative you’re using? I am considering switching to Zed as the primary IDE due to the same issue here.

harry1993 · July 15, 2026, 9:12am

Have you find a solution to set context?

anuj · July 15, 2026, 11:00pm

it’s not in cursors interest to do it because it would break the incentive to use their IDE if you don’t use their API so they can’t train composer their AI model this way or get the data for how the product is being used, this way they can gatekeep the good ux from others I believe, I did see that on the $60 plan you do get 1M context window with the models but I wonder if that would also give me 1M window for my own hosted models, either way, they lost me as a customer here.

Topic		Replies	Views
Subagents ignore user's own API key. Always bill against Cursor plan Bug Reports byok , anthropic , subagents	12	854	May 14, 2026
Compatibility with DeepSeek model's design to return reasoning content after tool calls? Feature Requests	40	5575	July 6, 2026
Incorrect model usage of subagents with custom API keys Bug Reports byok , anthropic , subagents	4	129	July 13, 2026
Fresh bugs with custom model Bug Reports byok	20	1731	February 26, 2026
When will Cursor fully support BYOK? Feature Requests byok	7	284	June 30, 2026