Ok fine, one more round
yes i imagine you guys have worked out vendor agreements directly and this is neither here nor there, but when you say the pricing is “not as straightforward”; this tells me you likely have a sweetheart deal w/ Anthropic (which — smart. can’t think of a better opportunity to cement mindshare w/ dev community as a model provider). But reading between the lines — with a dash of speculation — you’re also implying that possibly the deal isn’t quite as sweet with Google. Again — neither here nor there, but noted … and tell Logan I said hook it up!*
Anyways —this still underscores a point from above:
… this will inevitably impact sentiment in community chatter ( … hi … ) and your perceived value will be scrutinized so long as the product is underperforming relative to the baseline of the models your product is wrapped-around or relative to prior releases. (btw — just so we’re clear — i do not mean wrapper in a derogatory way).
But back to the core point of all of this, you touch on it here:
I think this confirms quite a bit, and unless you correct me otherwise, one takeaway I believe we all have from this is:
Cursor’s pre-processing is truncating provided context — including MAX. Best efforts or otherwise — trimming context is a feature, not a bug.
When you say “they still perform better given only what they need and nothing more” — it’s framed in such a way that one seemingly couldn’t argue with it … but the entire statement and core issue that began this discussion is: the product is simply doing a bad job at determining what the model “needs”.
And my point is: Different models have different needs.
… and the Gemini series of models happen to be the only models that perform better given more context. Feed the geese!
Now … regarding MAX — when you say “criteria is much lower”, what does this mean?
-
The Top K threshold is lower? (semantic search returns more results)
-
Are chunks bigger?
-
Does it stack-rank context based on semantic results then “automagically” decide how/ when to provide full files vs. chunks vs. outlines? And if so, does MAX just have a larger bias towards exposing more of the larger?
The reason I’m asking these kinds of questions is because: the better I (we, the people) understand this, the more I can understand how best to effectively engineer how i provide context. If you can offer any useful “tricks” or steps we can take to ensure we’re getting best — most coherent — output, I’m all ears.
Just one final note to consider feeding up the chain ..
MAX is a road to nowhere.
… as it functions today …
Here’s why:
1. The Fundamental Disconnect
MAX suffers from a fundamental misalignment between promise and delivery. When you market a feature based on raw numerical capability (1M tokens) but then silently filter and manipulate the provided context (context window = instructions + cursor rules + tools + provided context + query), you’re creating cognitive dissonance for your users.
We’re sold on infinite headroom but find ourselves in a house where the ceiling keeps randomly dropping. It’s like selling a Ferrari with a governor that keeps it under 55mph – technically it’s still a Ferrari, but you’ve neutered the very thing that made it special.
2. Too Amorphous. Too Cerebral
Given cursor is interfering with the provided context (different from context window), the value proposition becomes too jumbled and too amorphous — too intangible for a paying user to be left with a “wow” when they use it. Selling an expanded context window as a core selling points is also too cerebral — to quote Wolf of Wall Street: “Fuçking digits — all very acidic, above-the-shoulders, mustard shît. It kind of wigs some people out.”
it’s a … supposedly … impressive number but a confusing veneer on something of questionable, intrinsic value.
Users who get excited about “1M tokens” will quickly discover they’re paying a premium for a black box that might be using 70k tokens or less – with a feeling of disappointment when they stare at a punishing trail of useless tool calls for linting errors and unnecessary “read” calls. they they can point too and say “was that worth 5¢?” … you’re rubbing it in their face.
Re: the Black box:
Good design, is honest — Dieter Rams
3. The Perception Problem
Each interaction with MAX creates multiple moments of friction:
-
Value Opacity: Users can’t perceive what they’re actually getting. Is my entire codebase being considered? Only parts? Which parts? The black box creates uncertainty.
-
Cognitive overhead: Unlike almost any other product, users see each transaction itemized in real-time. Imagine if Netflix showed you a running cost counter for each minute you watched.
-
Punishment Loop: Every failed interaction isn’t just disappointing—it’s a series of small paper cuts in the form of visible tool calls and charges that remind the user “this didn’t work AND I paid for it.” (worth mentioning — this overhead is exclusive to MAX — I know i’m rolling the dice and not thinking twice with non-MAX because the pricing model is significantly better framed.
Current implementation is like the water torture of consumer pricing psychology. Drip.
Believe me — it’s not about the 5¢ — it’s about the perceived value, the resulting output and effectively being faced with a trail of receipt each time you use it. If it doesn’t work and it makes 10 tool calls and edits a bunch of files unnecessarily, it’s like staring at 10 receipts that are just there for a user to count. It’s annoying and makes users think (and anguish) too much about things they don’t actually need to care about.
Each time presents an opportunity for an end-user to question the value of the product being provided and you are ultimately leaving your implied reputation to the reliability of the model. If the model f’s up 5 times, it looks bad 5 times. When a user can directly attribute that bad response to something they paid for, it’s really an unnecessarily bad user experience.
At the end of the day, using an LLM is still like pulling a slot machine with lower than 50% odds – and while models are improving, the value proposition is in intelligence, not raw context size. Especially when Cursor is trying to be too clever by half with context filtering.
4. The Reputational Calculus
You’re currently setting up a dangerous equation:
User disappointment × Visible costs × Frequency of use = Rapidly deteriorating brand loyalty
When your product regularly shows users they’re paying for disappointing results, you’re training them to look elsewhere. This is basic behavioral economics—you’re creating a negative reinforcement loop with your own product.
The promise is too difficult to fulfill.
If Cursor manipulates provided context in ways users can’t predict, control, or even observe, it cannot honestly market MAX based on context window size. It’s a promise that – despite my determined testing – I can’t verify is being fulfilled.
Without transparency, you’re setting up users for the psychological equivalent of biting into what looks like a cream-filled donut and finding it hollow.
Short term gains vs. reputational risk reputation:
It’s totally a boost for you guys — I can run the numbers on my usage alone and extrapolate from there. But you’re burning through reputation capital at an alarming rate and it worries me for you.
Onward and Upward
Stop selling MAX as a context window size upgrade. Instead, position it as a comprehensive solution for working with large codebases—then actually build that solution with intentional features.
Some concrete ideas:
-
Context Visibility: Create an interface showing exactly what files and chunks are being included in context. Put them in order, prior to each “thinking” call and even give it a second before collapsing to show it’s “value”. It should show each portion of the context provided. It’s collapsable by default after a second but can be opened exposing a list of files (which can also be opened — showing either full file, partial or outline).
-
Context Control: Build tools letting users explicitly prioritize critical files/directories. Maybe drag/drop to sort in the input window.
-
Learn from users: Use ML to build context on a per-user basis.
-
Predictable Pricing: Move to higher tier subscription (naturally “Ultra”) just have a higher subscription tier. Use fast credits, whatever. Blend costs — eliminate the cognitive overhead. Period.
-
Value Anchoring: Bundle additional premium features that increase perceived value beyond just context size
-
More tool calls still makes sense to me
Remember: great products feel inevitable, not experimental. Right now, MAX feels like a beta feature that escaped the lab too early, and users are paying to be … subjected to water torture.
Your reputation isn’t just on the line – it’s actively being spent down with each disappointment. This is fixable, but it requires rethinking — and fast to avoid a complete depletion of reputational capital.
Strong opinions, weakly held.
- Ash
Last question: How precisely can I make use of the full 1M (or 200k) context window in MAX?*
Given every prior attempt I’ve made fails — at best, I’ve barely been able to use 130k tokens due to filtering.
I’d love for a straightforward answer to this.
PS I would strongly recommend reading the book “Predictably Irrational” by Dan Ariely—one of the better books for consumer behavior/pricing psychology. Everything I’ve described above is like textbook “what not to do.”
PPS Even if you didn’t read this I still consider this a win because I was able to tie a Dieter Rams quote in with a quote from Wolf of Wall Street and it still made sense.
PPPS: I don’t actually know Logan, but if I did, I’d tell him the same thing.