How to disable/defer "Chat context summarized." interruptions?

I do rocket-science stuff, and benefit from spending hours crafting my chat, and get excellent results, then the dreaded “Chat context summarized” shows up, and wipes out all my hours of work, and Claude goes back to being an in-the-dark idiot. I use “duplicate chat” as much as possible to try avoid that, and I “go back in time” whenever possible (edit prior messages), and my tools and instructions are already as lean as possible… but still hit the problem often enough that it’s a major time-waster …

I notice it can happen when I’m only using 59% context - so something seems to be triggering this unnecessarily? I’ve tried “DO NOT SUMMARIZE” instructions - but it looks like there’s no agentic control over when it decides to do this?

Does anyone know how to tell it NOT to do any summary stuff, unless there’s no other option (i.e. 99% context use).
Maybe a “/nosummarize” command is needed, to at least give us back our work?

https://cursor.com/docs/agent/chat/summarization

2 Likes

100% agreed!
This is so annoying… I literally built a big feature that requires having many files in the model’s context for each step in the chat… so I’m building the context brick by brick, then this “Chat context summarized“ step made the model completely lose memory of the details that it was able to do with no issues before…
I assume this feature was requested before to ‘save tokens’ by preventing sending the whole chat…. Well it’s doing the opposite now :angry: !!
I’m not sure but I also noticed that this happens when you leave the chat thread for some time…
PLEASE REVERT THIS CHANGE OR MAKE IT OPTIONAL!

1 Like

I’ve been railing against this for months – they’re not inclined. I thought there was a brief moment of respite with it prompting “Summarize to continue” but alas, no, that went away in the last update. It is beyond infuriating.

If I go back in my chat history to re-run the previous command it wont restore the chat either, because its been destroyed. Really the weakest piece of cursor hands down, frustratingly, and poorly implemented. Stop so I can switch to a 1M token model for crissakes or give me tools to edit the context.

Otherwise love Cursor/Composer 1 model is awesome. But burns tokens and context like nobodies business making this even more important.

@deanrie is there anyone you can relay this to, please?!

Totally agree - “Chat context summarized” MUST NOT BE FORCED BY CURSOR without explicit permission from user. This “feature“ is a total disaster. It ruins the days of your work!!! Please, make it OPTIONAL!

2026-01-24_05-43-53

Have you actually tested that?
I notice that compression kicks when when >50% of my context is used - which is mad-annoying, because it’s wasting half the context every time!
I’d rather have an expensive smart model “get it right”, than a “crippled by compression” model mess it up and make me have to spend even more to fix the mess…

If you “duplicate chat”, I notice it tends to also do compression sooner, so there’s some back-end bugs with this as well… might be related to cache-pricing discounts or something… but like I said before - I value accuracy over cost every time! I don’t want Cursor “trying to save me money” - I want to manage my OWN costs and context thank you!!

MAX should work like this. If it doesn’t, file a bug report.

In my case, I don’t encounter any context autocompression at all even in normal mode until the context window is really close to being full.

All LLMs except GPT-5.2 lose significant quality as their context window fills. So, if a section of work is complete, it’s best to use /summarize or open a new chat rather than trying to manage the entire project in a single chat.

For long-term memory, use Markdown documents.

1 Like

I only use Opus-4.5, and that is definitely NOT my experience. Not to mention “All LLMs…” sounds like someone told you something nobody would really know, and you believed it? If it’s not your own 1st-hand experience, it is usually not true. So many people have no idea what they’re doing, that all “advice” is basically worthless.

This is a known limitation of the attention mechanism in transformer-based LLMs. Believe what you want.

Yes, this doesn’t mean you need to compress context for every chat request, but you should do it whenever possible. It’s part of context engineering.

1 Like

Hello, are you saying that MAX mode should prevent the chat context from summarization? If yes, it doesn’t work like this for me. It always summarizes the context whenever it grows a bit higher than 50%.

There are definitely issues with the summarization feature that need to be addressed. It is also not always clear when it is happening, and I think much of the time it is stealth-summarized during “Planning Next Moves” and the like.

That said…remember that in a given chat, every time you prompt again, ALL the prior context has to be sent again along with any new prompt, as well as any newly attached context to the prompt you are issuing. Depending on the amount of context you attach to a given prompt, and I suspect any predictive assumptions the agent makes about how much additional context might be used by any of its internal functionality (system prompt, etc.), it is not necessarily unexpected that you might go from say 50%ish, to over 90%ish, with a single prompt.

Now, I agree, Cursor should not be preemptively summarizing without need. If they are summarizing at 50% when the next prompt would push that to 65%, summarization is UNNECESSARY and if they waste our time summarizing anyway, that’s a very real problem. I mostly use Claude models. Those models support a 1M context, however we are provided 200k. So Cursor is already limiting context, to avoid the problems that occur when you over-use context. There shouldn’t be a need to over-aggressively compress context when it is not strictly necessary.

I think part of the problem, is that Cursor has a SEVERE LACK of insight and transparency into context usage. This has been a problem for some time, but as I get more refined and advanced with my usage of the agent, I find it has become critically essential, that they stop obscuring context usage behind their, nearly-useless, tiny little context usage indicator in the prompt. They can keep that, but, we need to be able to CLICK that, or something, and see a DETAILED REPORT of context usage, EVERYTHING, their internal usage and ours, from the full context window the selected model provides.

Anything less, and…well, here we are. :smiley:

A thought on the progressive planning approach. Because context IS and CAN BE summarized, I periodically have the agent “flush” the current state of the plan to a markdown file or something like that. When the agent seems like it has lost detail in its context, I can have it refresh itself by referencing the markdown file.

File a bug report

I’m on the 2.5.0 nightly build, and I seem to be experiencing hidden context compression. Or, the Cursor Team has even managed to break the context fullness indicator.

Theoretically, GPT-5.2 can generate 62k tokens for you IN ONE RESPONSE. I asked it to generate an inappropriately large text insertion into a file as an experiment, and after 12 minutes of generation, I simply got tired of waiting for the response to finish.

And if the model hits the edge of the context window, you’ll get a fragment of the response.

So how do you configure this thing to work perfectly?


P.S. I don’t deny that random functions break in the Cursor all the time.

So, response vs. request? I guess, if the response is large enough to bust the context limit, then yes, you would need to summarize to accommodate. In my case, I see summarization and what I believe are summarizations during “Planning Next Moves” or “Generating…” while a request is being generated.

I do think it is a complex problem, though, and not a simple and strait forward as it seems. The frustration is that, it seems summarization is happening FAR more frequently lately, than it did in the past. When the summarization feature was first introduced and for the month or so after, summarization seemed to occur less frequently and more when context usage was higher.

Lately, however, it seems like summarization occurs constantly, and it is a time consuming process that now seems to interfere with the majority of requests. I am here now, because I had about 27% context usage, and I’m pretty sure summarization happened as my context usage initially dropped to about 23%, before increasing after the response to that request. Why the heck, would they summarize at such low context usage, wast my time (took several MINUTES to complete the process!), when there was plenty of context space left, and even with the added context, I was till only in the 30% usage range?

There has to be a balance here. Aggressive summarization is wasting developer time. If it takes even a minute to complete each time, and it is now being done before the majority of requests are made, that sums up to a HUGE waste of developer time. However often enough, it takes many minutes to summarize, not just one. I understand the necessity of summarization, and while it is effectively a form of lossy compression, it can be optimized to maintain as much critical detail as possible while still reducing context usage (as you stated, this is a part of good prompt engineering)…however, it shouldn’t be done so aggressively that it has a detrimental impact to the developer experience and developer productivity. Right now, it seems to be detrimental to both.

bruh… The Context Used indicator shows the context of the Subagent