When using Gemini-3-Flash, the model all the times is calling chat summarization. I almost never get over 5k tokens - just when it read a large file but then it triggers immediately context compression. It then completely forgets the task.
Gemini 3 was my preferred Agent model. So it does not stop me from using Cursor, but it made me experiment with Copilot and Antigravity.
Steps to Reproduce
Create a project with at least 2 large files (>5k tokens).
Ask the agent to review the two files sequentially.
Observe context window after loading first file.
I get answer after review of first file, without scanning the second, as it forgets the task.
Expected Behavior
Only summarize the chat when context window (200k) is full. That was how it worked last week, I see the issues only since Feb 11 or so.
It looks like the context summarization thresholds for Gemini 3 Flash might be set incorrectly. Summarizing at 5k in a 200k window is definitely not expected behavior.
Can you share the Request ID from one of those problematic sessions? (Three dots in the top-right corner of the chat, then Copy Request ID.) That’ll help the team figure out what’s triggering the early summarization.
As a temporary workaround, try Gemini 3 Pro or another model for these tasks while we look into the root cause.
I am not usually using Gemini 3 Pro, but I tested it and it has the same problem. Request ID: 68c84653-cf00-4ee8-81f7-5a6404c14586. Other models (OpenAI, Anthropic, your Composer) does not have the problem.
The issue is still present also with newest Cursor (2.5.20).
I think that this Reddit post or this one might have experienced the same issue. When I tested it today, it also quickly (30s) consumed 4.4M tokens with the context window circle constantly indicating load immediately followed by context summarization.