I just went through the documentation and noticed that other models only support up to 60k tokens.
However, we know Google Gemini 2.5 Pro supports up to 1M tokens. Is this a limitation in Cursor, or is there another explanation?
I just went through the documentation and noticed that other models only support up to 60k tokens.
However, we know Google Gemini 2.5 Pro supports up to 1M tokens. Is this a limitation in Cursor, or is there another explanation?
I would love to know whats actually the context window of Gemini 2.5 pro-exp 03-25 VS gemini 2.5-pro max, and if it would be great if cursor team actually adds this to the documentation before releasing stuff.
Ah, they just launched a new version that supports Gemini Pro Max
haha, I bet itās similar to Claude Max.
Iād highly recommend using OpenRouter and Cline if youāre planning to work with large contexts.
Cline is too expensive
yesterday Iām using it with Openrouter for free
you can try as well
How did you use it for free? The API or the App?
I would really like the cursor to stop nerfing the llms serious context indeed this is boring
Iām using the studio there because there is 1m and itās free and complicated why do this?
hi, there.
Gemini API is free for now (experimental).
Looks like Cursor updated their page on context window sizes: Cursor ā Models
According to their docs, it seems only their āGemini 2.5 Pro MAXā setting actually uses the 1M token context. Thatās quite surprising.
Does anyone know if the Google API bills differently based on the requested context size itself? Like, is a request set up for 1M tokens priced differently than a request set up for 128k tokens?
Iām trying to avoid using Cursorās āMAXā option because of the $0.05 per-action cost. It charges you even for simple things like reading a file. The total cost from these small actions seems like it could add up to be much more than the actual Gemini 2.5 Pro API call would cost directly.
I agree with you that the $0.05 per tool use is quite high. Is there a different way that part could be billed? Perhaps could the initial request price be increased?
Using āFreeā is great for testing yes, but if you are working on a professional be careful that your prompt might be used for future model updates.
Yes, Googleās APIs (AI Studio and Vertex AI) are charged differently depending on the amount of tokens given and generated. Thereās no ārequestedā context size. Itās just charged by how much you use, so a 128k request (actual used) would be different than 1mill request (actual used).
Googleās API charges per token:
For example, Flash 2.0 in AI Studioās API, is $0.10 per 1mill tokens input (itās priced down to each token), and $0.40 per 1mill tokens output (itās priced down to each token).
Pro 1.5 in AI Studioās API costed up to $2.50 per 1mill tokens input, and up to $10 per 1mill tokens output.
I would expect Pro 2.5 to be between $1 to $3 per 1mill tokens input, and $5 to $15 per 1mill tokens output.
So for it to be $0.05 per tool use, and each tool call can have up to 1mill tokens input, thatās cheap. Youāre talking about 50k input tokens (actual use) max (@ $1/mill input), or some smaller amount if you also take into account the output tokens.
Regarding tool usage and context limits, I suspect that tools donāt each process the full 1 million tokens independently. Itās more likely they use smaller models for pre-processing and then make one request to the core model. Am I wrong?
If so, the current $0.05 per-tool-use pricing seems counter-productive for users wanting to leverage the full context window of models like Gemini 2.5 Pro. Current approach forces users onto usage-based pricing, which can become significantly more expensive.
A better approach might be to introduce a higher subscription tier, maybe an āUltimateā plan, that includes full context access. While the base $20 plan might be insufficient for heavy usage, tiered options would offer more flexibility and predictability than the current per-tool fee combined with usage-based pricing for large contexts.
They still include the conversation history, and some codebase context, per tool invocation (presumably). Itās not really about the ātool callā itself, but that when these LLMs do a tool call, itās actually the end of an existing LLM invocation that is asking the Cursor system to run a tool. So when the tool is run and finished, they have to resend the convo history, context, and tool call result, so that the LLM can ācontinueā where it left off. Each tool call can roughly be equivalent to an LLM invocation or new user message.
However, that all being said, I usually donāt see it using more than 50k tokens (for me at least) per message or tool call. And the context only grows big enough for MAX to āmaybeā matter when it starts prompting you to create a new chat .
I donāt see much benefit to MAX for me. I donāt have convos that last long enough for it to matter. I am using Cursor on codebases that are several hundred thousand lines of code, but itās never pulling in the entire codebase, even on MAX. Iām usually using one chat per feature. Yes, itās a higher max context limit, but larger context isnāt necessarily better either (see MRCR benchmarks and others). I think they should instead offer to switch to MAX when it needs more context (giving that option to the user), but always start from the standard version.
The higher subscription tier is something I definitely would agree with. I wouldnāt use it, but those that want to try to maximize the context might like that.
Hey, Gemini 2.5 Pro matches Claude 3.7 in having 120,000 tokens as standard, but can access the full 1M tokens in Max mode!
The Models page of our Docs is now updated to reflect this!
To @decsters01ās point, we havenāt nerfed or downgraded any of the LLMs specifically - the context window has always remained consistent, and has only increases as we have improved Cursor and itās interaction with the models.
As @ThomasT says, while Google are offering Gemini for free via their API for individual users, they will likely use your data to train Gemini 3 on, so for now you do get that benefit with Cursor.
We are looking at the pricing structure for these models moving forward to see what is simple, sustainable and fair, but nothing changing yet!