The "Whole 200k Context Window" of Claude 3.7 Sonnet Max

I’ve spent considerable time (and yes, money too) thoroughly verifying this finding. This wasn’t just a one-off test but a methodical investigation to ensure my observations were consistent and accurate. As for now,anyone can verify this finding in their own Cursor Ask. The cost of running this experiment? A maximum of $0.05 USD - a small price to pay for uncovering the truth behind the marketing claims.

I was quite excited when Cursor announced Claude 3.7 Sonnet Max with its promised “whole 200k context window.” As someone who regularly works with large codebases and documentation, this seemed like a game-changer!

After eagerly testing this impressive feature, I discovered something interesting - any message exceeding approximately 70,000 tokens simply won’t be processed. The system just refuses to handle it.

The silver lining? At least Cursor doesn’t charge for these rejected requests! So we have that going for us, which is nice. :smirking_face:

I understand that implementing true 200k context handling is cost challenging, but perhaps the marketing should match the actual capabilities? It feels a bit like ordering a 12-inch pizza and receiving a 5-inch one with a note saying “we’re working on making our ovens bigger.”

Has anyone else encountered this limitation? Are there workarounds I’m missing, or is the “200k context window” currently more of an aspirational feature than a functional one?

Looking forward to hearing your experiences!

8 Likes

Lol ok, dramatic approach to an issue :slight_smile:

I can imagine its not just cost challenging. Cost handling is the simplest part, because if things cost more cursor would have to charge more. And its fair that cursor didnt charge you for failed requests.

The tricky part is actually the handling of the context.

I assume you used Agent.
From my code 70k tokens would be about 3500 lines of code. Thats without any rules/rule files, MCPs, …

Have you tried enabling “Large context” settings?

Docu says: “Currently offered as an option for Claude 3.7 Sonnet, MAX mode provides a 200k token context window, 200 Agent tool call limit, and the ability to process up to 750 lines per file read operation.”

Is it likely you encountered the 750 lines read limit instead?

Who cares?! SuperDuperMAX might be out tomorrow and by then Claude3.7-Thinking won’t be able to outperform a pocket calculator!

2 Likes

I I anticipated this condition,so I used “Ask”

Wow, really intriguing is the worst thing is to know that I did the same test on requests on ro-cline and it actually read the files with 500k tokens if something so piggy can because the cursor can’t?

1 Like

PLEASE CURSOR JUST CHARGE PER TOKEN AND LET US USE THE WHOLE CONTEXT. MODELS ARE GETTING BETTER OVER LONG CONTEXT AND THIS REGIME OF DRIPPING IN TOKENS WON’T WORK FOR MUCH LONGER

1 Like

Even 70k input tokens cost +—40 cents, and then there are output tokens, indexing db, and other service costs, which can easily be 10x per request (you pay 5 cents, they spend 50 cents). That’s why they hope you will not use all context or, on average, build this 70k context with 10 codebase reads to cover their costs.

Do they have to be profitable now? They are still a startup :slight_smile:

Netflix made billions in losses over a decade before they made profit. So do most startups that are actually widely successful.

Not saying they have to lose money but I doubt they have the finance issues as they got new funding recently.

A similar discussion is ongoing in another thread, same conclusions. It makes no sense for Cursor to artificially limit tokens and cripple usefulness of the Ai integration.

This thread has a false assumption thus an incorrect logic, Cursor limits any -initial context- over x amount depending on model because it will increase context with tool requests or more chats and you risk being shown -max context reached-

2 Likes

True, i see clear proof even on 3.5 when i have a chat thread that goes on a bit longer, eventually 3.5 starts hallucinating details.
Then i switch to new chat and its like new.

That’s about models loosing context at the mid after around 2k lines which is a very well known problem, here the discussion is about Cursor initial context and the limit it imposes in order to let the context grow if tools are called(reading more files-using mcp), which is something @ransxd didn’t take into consideration

1 Like

bro, you don’t need to limit to 70 k for that. 150 will do, the code being generated isn’t that many tokens and you need to start a convo anyway. Also Gemini 2.5 has a 1-2 mil context window. We’re still gonna be drip drip dripping in context? Ridiculous

70k input 70k output 60k reasoning = 200k, a simple refactor job or documentation request will get to 200k

Context length issues:
Source: [2501.05414] LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation

Source: [2503.06692] InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models


(Cursor does this periodic summarization when going forward in a chat, which is the way researchers tell us it gives the most quality and as a thank you users in this forum prefer to spam nonsense instead of getting informed)

2 Likes

Where are you getting 60k reasoning tokens from? Most outputs are not 70k either.

The figure from the first paper shows degradation with output, not input, so if anything, we should cut that down further.

a thank you users in this forum prefer to spam nonsense instead of getting informed)

Simmer down homie. Evals on the latest models show much better utilization of context vs even 3.5, and claude code shows that in practice reasoning over longer context code inputs, (over 70 k) work just fine.

“Most outputs” .. Cursor needs to stay inside limits or it will error out, you’re not reasoning enough.

Imagine if LLMs degradated the input

Give sources and be sure its not marketing, I’m geniunely interested on this “much better” claim

  1. You don’t know how claude code internally manages the token context 2. You’re assuming it works just fine(no research seen) 3. You expect to pay 20 times less and get the same results

“Most outputs” .. Cursor needs to stay inside limits or it will error out, you’re not reasoning enough.

This is nonsensical, and yet you couldn’t resist a dig. This will be my last reply to you if you don’t stop the ad hominems. I understand how context windows work, most code outputs aren’t close 70k, so no need to optimize for that at the expense of more common use cases. And there are ways account for that too.

Imagine if LLMs degradated the input

I don’t know what “degradated” means.

Give sources and be sure its not marketing, I’m geniunely interested on this “much better” claim

Look up gemini 2.5 numbers for https://arxiv.org/html/2409.12640v2 There are others that are trivially found for 3.7

  1. You don’t know how claude code internally manages the token context 2. You’re assuming it works just fine(no research seen) 3. You expect to pay 20 times less and get the same results
  1. False. 2. False. 3. False. third one is patently obvious, I’ve said several times I’d like them to charge by token.

Edit: also You keep asking for sources without sourcing your 60k thinking token

Maybe I was not clear enough, Cursor cannot go over LLM limits or it will give an error/truncated response, to prevent that, they estimate the max input context

Are you the same person talking about ad hominems attacks? Living by example would be better


Its gemini 1.5, anyways it shows a clear decline after 2k and less after 8k, after 128k Gemini contrary to others shows a “performance stabilization”, giving positive vibes about future long context awareness, for current workflows its better to stick under 16k context even with recent models

Yes nothing about your replies are intelligible or cogent (hard to tell which one, or both, since they are mimics). Gemini 1.5 data are irrelevant in isolation, google released benchmarks for 2.5, you can do a search for that . Look up what ad hominem means and then show your work on my reply and stop deflecting. Address my other points or your replies are spam and i won’t be contributing to more spam by replying further.

1 Like

It was your paper link claiming its 2.5, give the correct paper then

“An ad hominem is a logical fallacy where an argument attacks the person making the claim rather than addressing the substance of the argument itself.”
Attacking someone because of incorrect grammar is an ad hominem attack

Told you I don’t want marketing, is that intelligible or cogent enough for you?

The community considers benchmarks from papers run on frontier models to be legitimate, and not just marketing. I’ll disregard your definition.

You literally attacked me several times. Point to where I attacked you.

Your claims are getting thinner and thinner as you can see. You are continuing to grasp at ever decreasing straws. Bye.

To the cursor team, please look at the decompiled Claude code, give us more control over context and charge by token. This will be more important with models that are continuing to get better at longer context reasoning, which is the trend with 3.7 and 2.5.

2 Likes