Auto Summarization problem with custom OpenAI proxy

Cursor IDE (v2.6.19, macOS arm64)

When using an OpenAI Base URL override (pointing to a custom proxy), Cursor does not perform context summarization when the context window fills up. Instead, the conversation crashes with a model_max_prompt_tokens_exceeded error.

Here’s what I’ve observed by intercepting traffic through my proxy:

  1. Cursor sends the full conversation history on every request (no incremental updates)

  2. As the conversation grows, the messages[] array keeps expanding (I’ve seen it reach 370+ messages / 128K tokens)

  3. When the upstream API returns 400 - prompt token count exceeds the limit, Cursor does not attempt to summarize or compress the context

  4. The conversation simply breaks — no recovery, no automatic summarization

My proxy implements its own summarization fallback (using a local LLM), which successfully compresses the context and retries. However, on the next user message, Cursor again sends the full uncompressed history, hitting the same limit.

I guess I somehow forgot to integrate something.

Thanks in advance.

Hey there,

You didn’t miss anything, this is a known limitation when using the Override OpenAI Base URL with a custom proxy.

Two things are going on:

1) Auto-summarization doesn’t trigger
Cursor only retries with summarization when it recognizes specific “context too long” errors from supported providers. When your proxy returns a different error (e.g., model_max_prompt_tokens_exceeded), it isn’t matched, so no retry happens, and the request fails.

2) Internal ops may be misrouted
Even proactive summarization (which should run before hitting the limit) can get routed through your custom endpoint instead of Cursor’s infrastructure. If your proxy doesn’t support the summarization model Cursor uses, it fails silently.

Also expected: your proxy’s own summarization won’t persist across turns. Cursor maintains the conversation state, so it keeps sending the full, uncompressed history on each request.

Workarounds for now:

  • Start a new conversation around ~60–70% context usage

  • Try /summarize before hitting the limit (may still fail depending on proxy support)

  • Use Cursor’s native routing for longer threads; reserve the Base URL override for shorter sessions

The team is aware of these BYOK summarization gaps. No firm timeline yet, but it’s being tracked.

1 Like

thanks a lot,

recognizes specific “context too long” errors from supported providers.

is this documented anywhere, maybe at the provider’s website?

I can mimic a similar behavior at the proxy level if i could access that specific string/trigger. for now i have implemented a sliding window+summary but the native one would be much better.

thanks again.