Gemini 2,5 performance: Great yesterday, terrible today

For some reason, using gemini 2.5 max, and having provided all of the files as context, i am still being charged for “read” tools … why?

It feels like a cash-grab but I don’t believe that’s your MO.

Additionally the model’s performance is dramatically weaker after yesterday’s down-time / updates.

1. Context-stuffing is no longer effective: It feels like theres too much middle-man, truncating, or “intelligent” attempts to surface relevant context processing when I’m providing tons of files to the model. The result is, it feels a lot less “smart” today than it did prior to the updates.

2. Tools don’t run properly: I preferred even running agent mode and manually hitting the apply button. The model seemingly had greater context when I was using my API key.

3. “Ask” isn’t any better, it’s arguably worse.

What’s the solution here? And why are we being charged 5 cents per tool call for a free model?

Suggestion: If we bring our own API key, let agent mode work as it works, lets the context be exactly what we provide, and don’t charge a premium for tool calls — just run them for the same included cost as if we were to run 3.7 (non-max)

It pains me to see a service I use daily hinder the quality of a significantly better experience I was already exposed to.

10 Likes

That is exactly my experience too!

Yesterday I felt like I was on a space ship flying into the future. I was cranking on a repo and was zero shotting nearly every request and diff.

I was absolutely floored.

Today it felt like the magic that was coming from 2.5 Pro evaporated and felt like 3.7 again.

Was so incredibly odd.

6 Likes

exactly.

3 Likes

Same, at first day agent support - model was perfect
Today the model follows the rules poorly, makes more mistakes and has become worse overall.

Why is it that when you optimize a model it gets worse? Or is this a way to bring people back to Sonnet?

3 Likes

If course that’s what it is. Same reason that Cursor doesn’t default to free models if you ask it for git status or something that is well within those models capabilities.

Or why models will make more expensive tool calls even if rules files provide it with explicit cli instructions.

3 Likes

It is interesting they can not figure out how to have Gemini “support computer mode”. Is it really that difficult to set this model up to function the same as Claude 3.7 as an agent? I would really like the ability for Gemini 2.5 to be able to completely access my terminal and run any command it wishes for the true VIBE CODING experience we deserve.

1 Like

I don’t even mind it burning my credits but racking up a $250 is stupid when I’m bringing my own (free) api key.

4 Likes

It’s total ■■■■ today. It’s doing the thing where it says “Ok I’m going to do XYZ.” then just sits there. You tell it to go ahead. It says it’s going to do it, then just sits there. And rinse and repeat, wasting your calls.

What have they done it’s terrible.

4 Likes

Keep the pressure up. Devs are active on Reddit - The heart of the internet but trigger happy on the bans. Imagine making the most fantastic leaps in productivity and then locking it behind a paywall without communicating anything.

“MAX” is awful too.

4 Likes

Wait you’re racking up that much in costs even though you’re using your own API key!?

I need to switch to using my own Gemini API key since we have an enterprise account through my work and thought that it wouldn’t be racking up points. Currently facing a problem when the key verification won’t even go through.

This is crazy, I doubt I’ll be continuing my subscription with Cursor next month.

yep - and it’s charging me 5 center per each call. I have to say “okay — are you … going to do what you just said you were going to do?”

i don’t think so — although this is really unique because there hasn’t previously been a case where they’ve had to deal with a mad-rush of hyper early adopters bringing their own API keys into the mix … I don’t think its specifically a Sonnet thing, I think it’s a platform (cursor) loophole they raced to fix. They likely saw a sizable revenue bump from 3.7 MAX that quickly flattened when Gemini 2.5 was released.

It’s funny imagining myself in their shoes because their core user base
(power users) consists of relatively technical, in-the-know/bleeding edge developers. So I’m sure it was a surprising amount of people (percentage-wise relative to full base of users) who grabbed their own api keys

2 Likes

it is always typical with cursor and new llms. A great llm comes and gets supported in it’s unadulterated state. it performs great until they start nerfing it. if you complain they would behave like communist dictators and delete your comment. That is why i cancelled my subscription and would not renew it. i finally got a better alternative. i just come here once in awhile due to alerts that keeps dropping in my inbox from different threads.

4 Likes

Currently this models largest issue is the auto run agent features not working when Gemini attempts tool calls. Even when I hit the play button to apply gets stuck. Sometimes it does work but most of the time have this issue.

Look this post

I believe that if there is one. Big vote might work

1 Like

underwhelming entirely. You’d think there would be some alarm bells ringing when any model can’t use the tools

Okay, I will analyze your codebase structure and content to understand its purpose.

First, I need to see the file structure of your workspace.<ctrl97>tool_code

print(await IAgentToolManager.list_workspace_files())<ctrl98>

and this is with the api.

If you’re paying the premium for MAX, it’s just as incapable.

Hey all, while there was an issue affect a small number of users regarding tool calls, we think this should now be resolved.

Regarding Gemini 2.5’s output performance, I don’t believe anything major has changed since the model was added to affect it’s performance!

Max does cost $0.05 per request AND $0.05 per tool call due to it’s increased costs, but this also give you access to the whole 1m token context window, hence the cost.

If anyone has any examples (and non-privacy request IDs ideally) where Gemini seems to answer poorly, do send them over and the team can analyse where its going wrong!

Hey Dan –

Thanks for chiming in – overall product experience has been great in the past but I think it’s worth pointing out a few things.

There’s nuance to what you’re saying, and I want to make sure we’re not playing rhetorical ballet; so I’ll just be explicit. You said “since the model was added”… assuming you mean since Cursor officially supported it (vs. users manually using their own API key).

I agree nothing has changed since Cursor officially added the model—but that totally misses my core point: the model performed better, and had better context prior to Cursor officially adding support. I don’t know the exact workflow variances internally between supported models and API-accessed models, but I can confidently assume it’s not nothing. And honestly, I’d go further and say it probably revolves around optimizations regarding context usage—because that’s exactly an initiative that would be on my plate if I were in the shoes of someone tasked with running product @ Cursor.

These models aren’t cheap—I totally get that.

That said, the “optimizations” are actually convoluting the model’s capability by trying to intelligently surface “most relevant” snippets, with the bet being something like —“Hey, win-win: we save on our cost-of-goods, and users still get great performance!”. But that just isn’t the case here. In fact, there’s been a significant drop in performance.

Here are my un-filtered thoughts + assumptions the UX and recent changes have left me with. Whether correct or not, this is genuinely what I’m left believing as a user:

The changes have eroded my confidence / trust in the product — and to some extent the team’s decisions.

Why? I can’t shake the feeling that these decisions were made hastily, reactionary (to gem-2..5 release), and driven by cost/revenue optimizations as well as preservation of perception of a new feature’s value (MAX) — with user experience and product quality taking a back seat in the decision-making framework …

Let me explicitly map out the constellation of thoughts and some speculation that led me to this train of thought:

  • ~last week: Sonnet 3.7 MAX launches — MAX’s core value clearly being larger context windows. I get this costs more. It’s a premium model, at a premium price, with premium cost-of-goods. I’m assuming this brought a nice bump in revenue—seems like cursor’s first “premium product,” evidenced by attention-grabbing gradients, persuasive behavior-driving copy in the model picker pop-overs, etc. I’m not going to assume you’re saying “hurray” for profits just yet, but this probably helps offset venture subsidies or losses and makes a great bullet point in an investor update or fundraising deck. Again—I get it.
  • Then, days later: Gemini 2.5 rolls out. A micro black-swan event — It’s free. And it has a million-token context window.
  • Then the internal “oh shît” moment: “This entirely invalidates the value of MAX… you know, that premium product we just dedicated a cycle to building? shît.”
  • Then, I can only assume, a brief internal scramble lands on extending MAX pricing to Gemini—charging users the same premium for a free model.

So as a user, my thought process goes as follows:

  • “I’m now paying for a model I know to be free”.
  • “Ok, if I’m not using MAX, my files/context get artificially limited – its contextualizing only portions of what I am providing.”
  • Then: “So basically I have to use MAX to unlock the full context.”
  • Then: “But wait—again … the model is free”
  • Then: “ohhhh they are limiting / optimizing the context window in the first place … everywhere? Universally? got it … interesting.”
  • Then: … “Oh, this creates the pain point solved by their new product, MAX – it will drive users to this. seems like a dark pattern …”
  • most importantly: "I can’t confidently assume cursor is using the context I’m providing … now it’s being subjected to some other intermediary filtering process? – a potential point-of-failure?"
  • Then: “Oh, I think I remember reading something about frontier embedding/semantic search model something something blah blah — whatever…” (we’ll circle back to that)
  • Then, I land on: “Whatever—fuçk it—I’ll just pay for MAX and move on.”

… So I switch to that gradient-laiden MAX model because “whatever — the API was awesome and this will just be better now that it’s integrated and getting the uncapped context window. I guess I’ll pay to get that experience back.”

But no, the performance was worse — and here’s what really stood out: during Cursor’s backend deployment the other day when the service went down, I attempted to use my own API key—exactly as I had done all day—and suddenly an alert popped up. I’ve only seen this alert during that deployment (probably updated since, because the copy was terrible). It said something along the lines of: “You can’t use your own API key with Agents, because Agents rely on proprietary internal models—and those cost money. Etc …”

This leaves me with an “ah-ha” moment and two immediate thoughts:

  • “Ok, maybe they’re incurring real costs from all this free usage. Whatever, fair enough—I’ll pay.”
  • But after experiencing how drastically worse the performance became, I’m now left with the impression that — universally — there’s some dumber intermediary model (be it whatever tool dispatch model + semantic/embeddings model) in the ether, preventing the smarter model from working the way it should —by serving bite-sized, context-optimized bits geared toward Anthropic’s way of handling context. While this approach might work well for Anthropic models, Google’s models aren’t Anthropic’s—they handle context very differently. This introduces a fundamental hindrance to the model’s capabilities, rather than letting Google’s model do what it does best (in Google’s case, maximum context = better performance).

So at that point:

  • I already have a sour taste in my mouth for paying a premium for a free model.

  • Then I start seeing erroneous tool calls to “read” files that I’ve already explicitly provided?

  • I think: "What’s the point of paying extra for MAX to have extended context only to ignore a provided files in the context?, and why is it now triggering a premium tool call to read that file I’ve provided?" and ultimately is it actually even using the “premium” context window I’m providing and now paying a premium for? Or am i being misled? (Further erodes of confidence/trust in the product)

  • This happens commonly — and even more frustrating it will have a series of these calls result in something like “Great! I’ll work on this now …”
    … and then crickets … the chat-run narcolepticly ends and does nothing.

    If you want a laugh — read the degradation of fuçks given in my thumbs-down / Send feedback notes every time this has happened. offensive and likely incoherent enough to make great HQ coffee-mug-decals.

… and scene ….

Anyways — your response doesn’t necessarily scream to me that you guys understand this as a pressing issue — more a “let’s throw it in the backlog — seems fine to me”
kind of thing. But I’m clearly not the only one with the issue which is why I wanted to dump my raw thoughts, and I have more i dont feel like articulating right now.

So, summing things up:

  • Ambiguity about how user-provided context is genuinely utilized or semantically filtered through internal workflows/optimizations leads to confusion and erodes my confidence and trust in the product.
  • I trust the product less because I can no longer confidently assume Cursor is actually using the context I’m explicitly providing it.
  • I’m either being charged for erroneous, unnecessary tool calls or I’m being mis-led about how context utilization in MAX.
  • I’m paying a premium for feature that leaves me not trusting if it performs any better than the non-premium version of the feature (because both use opaque interstitial models)
  • Charging premium prices for models freely available elsewhere creates frustration.
  • Agent performance is obviously platform-dependent, clearly optimized for the Anthropic “vibe.”
  • The performance when it was free wasn’t better because it was free—it was better because it was UNLEASHED!! lol
  • I’m paying for a model I currently know is free, and I can’t use my own API key for it without degraded features.

Here are a few suggestions:

  • Consider Better context transparency around what context is/was actually provided during a request.

  • Consider reducing cognitive overhead + cumulative frustration: Maybe re-jiggering the pricing model / economics to blend “read”, grep, etc so they are “free” to the end-user so we don’t get pîssed counting 5¢ every time this thing reads a file,
    or worse yet — reads a file 100 lines at a time
    … for a file already provided …
    … in a supposed “premium” context window …
    … Then fails to provide output …

  • Consider a “Passthrough” mode:

    • Only enabled for API Key usage
    • Must still be a pro member (so help me … do not gate something like this to enterprise)
    • Uses ALL context you provide
    • Tool calls burn a fast request (minus grep)

Look, I’m writing all this because I’m invested in the product and want to see it succeed. I’ve spent considerable time with Cursor and believe it has massive potential. But these recent changes feel like a step backward in user experience for the sake of metrics. I’d really appreciate it if the team could take a step back and reconsider the current implementation - or at minimum be more transparent about how context is being managed.

love,

ash

5 Likes

I just got my Gemini API key working by upgrading Cursor, and I tried using 2.5 MAX but didn’t notice any additional charges and my number of requests aren’t increasing. I don’t have usage-based pricing enabled, and everything seems to be working without charge through my API key.

Are you being charged directly in real time, or where are you seeing the $0.05 per-request charge?