Small context increasing cost?

Just a thought, the context windows being so small and being aggressive is not always a problem, but I think the problem is HOW AGGRESSIVE the context compression is (The summarization system you use). For context (pun intended) I often run into issues where things start messing up and functions are renamed, etc… In these cases, I will set a rule and keep running until it starts working again. Sometimes days. (Side project, runs in the background) This racks up a LOT of requests… Sometimes I use roo code and within a few prompts (and a context window around 100k) it resolves my problem that cursor was stuck on forever. Sometimes I need to let it read in the whole file (around 600-800 lines).

I guess what I am getting at is that I just kept it going, racking up way more cost to Cursor’s apis than if the context window COMPRESSION was not so aggressive. (Not size, 100k in Gemini is plenty)
I feel like there is something to be said for allowing us to toggle how aggressively to compress the context, even if the window is fixed at a smaller size (which I think is fine). I would love to have a little control over the conversation summary size, how frequently they happen, and if I want to manually add anything missing in the summarization process. As well as being able to disable it completely. (and automatically enable it after any given conversation,make it manual to disable compression every time,making it something people will only use if they are explicit about)

I would love to have the same context window bar that cline or roo code has, and that way I could know when I hit the limit, and if compression is on, then I can see how much the compression is actually helping. Basically, if we had the ability to see how full the context window is, when compression happens, have a bit of control over it (and maybe add an MCP for the summarization llm or something), and be able to tune how aggressively it happens, it would likely SAVE YOU MONEY and SAVE US TIME and give us a better user experience!

Until something is done to address this I will keep it running with a rule I create, running potentially hundreds of requests when it gets stuck until it fixes things, where otherwise it very well would have solved the problem in a handful of requests with full file reads and no context compression.

It would save us both time and money. Win win.
the big thing is likely the dev time. It is not a small ask, I understand. Features like this are desperately needed. Something has to be done with the context window handling and I don’t think you will figure out a one size fits all for this. (I don’t think any one size fits all will work anywhere) So making it configurable will give you the edge you need right now (as a business), as there is a lot of competition, and it is growing quickly. I myself am tempted to leave for the other options simply because of the context window issues. Again, I am fine with smaller context, we just need some control over the compression/summarization you perform, and be able to disable it completely at times.

Just a few thoughts from a PM at major tech company which will remain unnamed. (Trying to help you guys out here)

Doesnt that also depend overall on users ability? Not the ability to set a summarization setting but understanding what it allows or breaks.

We both know whats behind ONE request incl. tool call iterations,… then the chat thread length, attached context, etc…

For experienced users, your idea would work, basically what users see when Cursor adds a model and hasnt adjusted it yet for its wide user base. Would it work for majority?

Not sure if I am misreading your statements but letting it run for hundreds of requests when it gets stuck makes no sense unless you have time to burn (and money). People contribute to Einstein the quote where he allegedly used a term for such an repeated action while expecting different results.

The issue will likely defuse with time as we are seeing larger and larger context models. They arent yet that capable at those extremes but are getting better with each version. Meta’s deranged Scout - with a pea brain but a mouth like a whale - comes to mind as the wrong way to address this issue.