A better question might be “Is there any benefit to using MAX mode with models whose maximum context window doesn’t increase in size?”.
The documentation pages Context → Max Mode and Models → Max Mode both state that MAX mode increases the maximum context window depending on which model you use. However, Pricing → Max Mode also states that MAX mode allows for longer reasoning chains as well.
The benefits are clear for models like Gemini 2.5 pro whose context window increases from 200k to 1M tokens. However, for models like Claude 4 Sonnet whose context window stays at 200k:
Does MAX mode increase the amount of time or resources the model is willing to spend on a given prompt (i.e. longer thinking chains if reasoning is enabled, or longer maximum output length per edit)?
Does MAX mode potentially increase the number of tool calls made per prompt (i.e. make the model more willing to make additional autonomous Read calls to gain more context)?
Knowing the answer to these ahead of time would be really valuable; I’d love to be able to toggle on MAX mode to give the model some additional time to think through a harder problem, or to let it run autonomously through a particularly long series of edits or implementation phases.
If someone could offer some clarification on those points, I’d really appreciate it.
@JhonW00d There are not as many differences now between Max and non-Max modes as they used to be.
One of the differences for Max mode is how many lines are read at once from a file. Max mode reads 750 lines and non-Max 250 lines.
Cursor uses AI models from several providers. Each AI model has a context limit set by the AI provider. We also may set a context limit to a model depending on its performance in non-Max mode.
When you enable Max mode, Agent can read up to 750 lines but the model decides what lines are necessary.
Max mode can be more expensive if more context is used. As Max mode allow for some models a much higher limit it will cost more based on the model and token API prices.
There is no factor for Max as it depends on tokens, e.g. given the same amount of tokens Max mode costs same as non-Max mode but if Max is used with more tokens then it will cost as much as an user provides context by attaching files or as much as the task requires code to be used.
Max mode should be mainly be enabled if the regular non-Max mode can not provide a large enough context window and if the model has a larger context window.
@condor Thanks for all the info, this is exactly what I was looking for.
Here is my new understanding of MAX mode. These are just my thoughts; Feel free to correct me on any point if I miss the mark.
The primary purpose of MAX mode in the current version of Cursor is to allow developers to make use of certain models’ extended context windows when needed. It essentially allows cursor to set reasonable default maximum context lengths (for cost/functional effectiveness), but **gives us an explicit way to opt into full context windows in the event we need them for whatever reason.
MAX mode is my way to tell Cursor I’m okay with heavier context usage (i.e. more reads and cache writes), at the expense of potentially higher costs. “Higher costs” here comes from more readily populating the context and using extended context if the chat becomes long enough, NOT from an increase in cost per token.
In the old request-based pricing model (fixed number of messages per month), MAX mode would also increase the maximum number of tool calls the model was allowed to make per message (presumably to save costs). However, now that Cursor has moved to a token-based pricing model, that’s no longer needed because the extended tool call functionality has been integrated into regular mode chats.
So, back to the my original question: Is there any benefit to using MAX mode with models whose maximum context window doesn’t increase in size?
For non-reasoning models, I would say “not really”, except in cases where you have the model autonomously explore a project to collect a large amount of relevant context. For example, I like to prime conversations with a first message like Deeply explore the code base to gain a deep understanding of this project's 'context management' feature ... to ensure most relevant code is in context before I start actually modifying the feature. That’s one use case I still think is relevant here.
For reasoning models, my answer is “maybe”. The docs at Pricing → Max Mode say it may enable longer reasoning (does this mean higher reasoning effort?). However, that isn’t stated consistently across the docs and to be honest, it would be hard (if not impossible) for me to measure its effectiveness anyways.
Also, this is really valuable to know:
Like @JhonW00d said, I’d love to see info like this made clearer in the docs. It’s written as “Reads up to 250 lines (750 in max mode) of a file” under Agent → Tools → Read File, which is fine, but I totally missed this because I never thought to look under specific tools for information about MAX mode.
One last question, the Pricing → Max Mode docs currently state that Certain models have the ability to use Max Mode, which allows for longer reasoning and larger context windows.
@condor Can you please verify whether MAX mode still enables longer reasoning for “thinking models” like GPT o3 and claude 4 sonnet thinking? This isn’t mentioned in the Models → Max Mode documentation, so I was curious whether or not it’s still the case.