[ask] Gemini 2.5 is support 1M contex?

aldinokemal · March 29, 2025, 2:47am

I just went through the documentation and noticed that other models only support up to 60k tokens.

However, we know Google Gemini 2.5 Pro supports up to 1M tokens. Is this a limitation in Cursor, or is there another explanation?

KonradBi · March 29, 2025, 7:27am

I would love to know whats actually the context window of Gemini 2.5 pro-exp 03-25 VS gemini 2.5-pro max, and if it would be great if cursor team actually adds this to the documentation before releasing stuff.

aldinokemal · March 29, 2025, 12:46pm

Ah, they just launched a new version that supports Gemini Pro Max
haha, I bet it’s similar to Claude Max.

I’d highly recommend using OpenRouter and Cline if you’re planning to work with large contexts.

dnslin · March 29, 2025, 1:14pm

Cline is too expensive

aldinokemal · March 29, 2025, 2:41pm

yesterday I’m using it with Openrouter for free
you can try as well

jessenaiman · March 29, 2025, 4:40pm

How did you use it for free? The API or the App?

decsters01 · March 29, 2025, 11:41pm

I would really like the cursor to stop nerfing the llms serious context indeed this is boring

I’m using the studio there because there is 1m and it’s free and complicated why do this?

0x0131 · March 30, 2025, 12:00am

hi, there.

leoing · March 30, 2025, 12:01am

Gemini API is free for now (experimental).

PabloDeCortes · March 30, 2025, 12:25am

Looks like Cursor updated their page on context window sizes: Cursor – Models

According to their docs, it seems only their “Gemini 2.5 Pro MAX” setting actually uses the 1M token context. That’s quite surprising.

Does anyone know if the Google API bills differently based on the requested context size itself? Like, is a request set up for 1M tokens priced differently than a request set up for 128k tokens?

I’m trying to avoid using Cursor’s “MAX” option because of the $0.05 per-action cost. It charges you even for simple things like reading a file. The total cost from these small actions seems like it could add up to be much more than the actual Gemini 2.5 Pro API call would cost directly.

keving · March 30, 2025, 1:03am

I agree with you that the $0.05 per tool use is quite high. Is there a different way that part could be billed? Perhaps could the initial request price be increased?

ThomasT · March 30, 2025, 2:19am

Using “Free” is great for testing yes, but if you are working on a professional be careful that your prompt might be used for future model updates.

Dillonu · March 30, 2025, 5:00am

Yes, Google’s APIs (AI Studio and Vertex AI) are charged differently depending on the amount of tokens given and generated. There’s no “requested” context size. It’s just charged by how much you use, so a 128k request (actual used) would be different than 1mill request (actual used).

Google’s API charges per token:

AI Studio API: Gemini Developer API Pricing | Gemini API | Google AI for Developers
Vertex AI API: Vertex AI Pricing | Generative AI | Google Cloud

For example, Flash 2.0 in AI Studio’s API, is $0.10 per 1mill tokens input (it’s priced down to each token), and $0.40 per 1mill tokens output (it’s priced down to each token).

Pro 1.5 in AI Studio’s API costed up to $2.50 per 1mill tokens input, and up to $10 per 1mill tokens output.

I would expect Pro 2.5 to be between $1 to $3 per 1mill tokens input, and $5 to $15 per 1mill tokens output.

So for it to be $0.05 per tool use, and each tool call can have up to 1mill tokens input, that’s cheap. You’re talking about 50k input tokens (actual use) max (@ $1/mill input), or some smaller amount if you also take into account the output tokens.

PabloDeCortes · March 30, 2025, 2:31pm

Regarding tool usage and context limits, I suspect that tools don’t each process the full 1 million tokens independently. It’s more likely they use smaller models for pre-processing and then make one request to the core model. Am I wrong?

If so, the current $0.05 per-tool-use pricing seems counter-productive for users wanting to leverage the full context window of models like Gemini 2.5 Pro. Current approach forces users onto usage-based pricing, which can become significantly more expensive.

A better approach might be to introduce a higher subscription tier, maybe an ‘Ultimate’ plan, that includes full context access. While the base $20 plan might be insufficient for heavy usage, tiered options would offer more flexibility and predictability than the current per-tool fee combined with usage-based pricing for large contexts.

Dillonu · March 30, 2025, 3:11pm

They still include the conversation history, and some codebase context, per tool invocation (presumably). It’s not really about the “tool call” itself, but that when these LLMs do a tool call, it’s actually the end of an existing LLM invocation that is asking the Cursor system to run a tool. So when the tool is run and finished, they have to resend the convo history, context, and tool call result, so that the LLM can “continue” where it left off. Each tool call can roughly be equivalent to an LLM invocation or new user message.

However, that all being said, I usually don’t see it using more than 50k tokens (for me at least) per message or tool call. And the context only grows big enough for MAX to “maybe” matter when it starts prompting you to create a new chat .

I don’t see much benefit to MAX for me. I don’t have convos that last long enough for it to matter. I am using Cursor on codebases that are several hundred thousand lines of code, but it’s never pulling in the entire codebase, even on MAX. I’m usually using one chat per feature. Yes, it’s a higher max context limit, but larger context isn’t necessarily better either (see MRCR benchmarks and others). I think they should instead offer to switch to MAX when it needs more context (giving that option to the user), but always start from the standard version.

The higher subscription tier is something I definitely would agree with. I wouldn’t use it, but those that want to try to maximize the context might like that.

danperks · April 2, 2025, 4:38pm

Hey, Gemini 2.5 Pro matches Claude 3.7 in having 120,000 tokens as standard, but can access the full 1M tokens in Max mode!

The Models page of our Docs is now updated to reflect this!

To @decsters01’s point, we haven’t nerfed or downgraded any of the LLMs specifically - the context window has always remained consistent, and has only increases as we have improved Cursor and it’s interaction with the models.

As @ThomasT says, while Google are offering Gemini for free via their API for individual users, they will likely use your data to train Gemini 3 on, so for now you do get that benefit with Cursor.

We are looking at the pricing structure for these models moving forward to see what is simple, sustainable and fair, but nothing changing yet!

Topic		Replies	Views
Please support the latest Gemini 1.5 Experimental 0827 ASAP Feature Requests	17	1359	September 21, 2024
Any plans to add Gemini Pro 1.5 Feature Requests	2	326	September 3, 2024
Cursor and Gemini 2.5 Pro (Max) Bug Reports	4	1107	April 3, 2025
Will you optimize the Agent to gemini-2.5-pro Feature Requests	15	1842	April 3, 2025
How to use Gemini 2.5 right now? Discussion	17	2679	April 4, 2025

[ask] Gemini 2.5 is support 1M contex?

Related topics