Auto Summarization problem with custom OpenAI proxy

genseric · March 16, 2026, 7:32pm

Cursor IDE (v2.6.19, macOS arm64)

When using an OpenAI Base URL override (pointing to a custom proxy), Cursor does not perform context summarization when the context window fills up. Instead, the conversation crashes with a model_max_prompt_tokens_exceeded error.

Here’s what I’ve observed by intercepting traffic through my proxy:

Cursor sends the full conversation history on every request (no incremental updates)
As the conversation grows, the messages[] array keeps expanding (I’ve seen it reach 370+ messages / 128K tokens)
When the upstream API returns 400 - prompt token count exceeds the limit, Cursor does not attempt to summarize or compress the context
The conversation simply breaks — no recovery, no automatic summarization

My proxy implements its own summarization fallback (using a local LLM), which successfully compresses the context and retries. However, on the next user message, Cursor again sends the full uncompressed history, hitting the same limit.

I guess I somehow forgot to integrate something.

Thanks in advance.

mohitjain · March 19, 2026, 8:11am

Hey there,

You didn’t miss anything, this is a known limitation when using the Override OpenAI Base URL with a custom proxy.

Two things are going on:

1) Auto-summarization doesn’t trigger
Cursor only retries with summarization when it recognizes specific “context too long” errors from supported providers. When your proxy returns a different error (e.g., model_max_prompt_tokens_exceeded), it isn’t matched, so no retry happens, and the request fails.

2) Internal ops may be misrouted
Even proactive summarization (which should run before hitting the limit) can get routed through your custom endpoint instead of Cursor’s infrastructure. If your proxy doesn’t support the summarization model Cursor uses, it fails silently.

Also expected: your proxy’s own summarization won’t persist across turns. Cursor maintains the conversation state, so it keeps sending the full, uncompressed history on each request.

Workarounds for now:

Start a new conversation around ~60–70% context usage
Try /summarize before hitting the limit (may still fail depending on proxy support)
Use Cursor’s native routing for longer threads; reserve the Base URL override for shorter sessions

The team is aware of these BYOK summarization gaps. No firm timeline yet, but it’s being tracked.

genseric · March 19, 2026, 2:38pm

thanks a lot,

recognizes specific “context too long” errors from supported providers.

is this documented anywhere, maybe at the provider’s website?

I can mimic a similar behavior at the proxy level if i could access that specific string/trigger. for now i have implemented a sliding window+summary but the native one would be much better.

thanks again.

Topic		Replies	Views
Override openai base url Discussions byok , openai	5	553	March 2, 2026
My agent keeps saying it is the start of the conversation Bug Reports chat , context	2	17	March 17, 2026
Summarizing conversations does not inherit model settings Bug Reports chat , byok , anthropic	1	10	March 14, 2026
Cursor Agent sends Responses API format to /chat/completions endpoint Bug Reports byok , openai	2	266	February 27, 2026
How to disable/defer "Chat context summarized." interruptions? Feature Requests	18	304	March 12, 2026

Auto Summarization problem with custom OpenAI proxy

Related topics