Compute Usage Max Mode

now cursor have new pricing model on pro subscription using compute usage, so it depends with user usage, I have basic question for this

if using compute usage, does it mean if we are using MAX mode in low content and normal mode it should be no different?
am I correct?

@danperks
could you please clarify this?

Low context usage in MAX was always recommended as it reduces token usage.
Basically limiting unnecessary context saves cost and speeds up processing.

Compute usage in AI is related with model complexity/weight (Opus is more compute intensive than Sonnet) and tokens (the more tokens the more processing).

Using the same model in MAX with low context and high context amount make sure a huge difference in resource usages and cost.

for example Im using claude 4 with 10k token usage context

is that no different with using claude 4 max with 10k token usage?

if different which one better and why?

There are a few different things:

  1. There is no Claude 4 and this is causes sometimes misunderstandings. The models are Claude 4 Sonnet (smaller/good/decent cost) or Claude 4 Opus (larger/better/5x more costly per Anthropic), two very different models. I wish that model providers would make the naming simpler, like just Opus 4 or Sonnet 4. :slight_smile:

  2. We can’t really compare non-Max and Max based on just one single number (tokens). Because by tokens it would be the same. But the results may not be the same.

Sure Max is more capable but also may go into much higher token usage, lets say less predictable in consumption.

If budget limit is important then non-Max is better as its predictable. Also its less costly for simpler tasks, so it would be overkill to use Max.

If budget is not so critical and there are complex requirements and big architectural changes, Max can get more done. It can use more context to handle the tasks but by not using it carefully it can cost much more than regular mode.

Personally I manage the context very well even with Max, but I had to practice and learn that to avoid too high cost of Max due to token usage. Plan a task with only relevant details without overcomplicating it, focused only on specifics that matter and ask Max to perform the task by attaching that narrow focus.

Overall, if we talk about 10k tokens then don’t use Max. Thats 10% of regular Sonnet 4 context in Cursor.

dont focus on 10k token that just sample
focus on compute usage

I just wanna know if 10k token usage in my request so thats mean no different between max and normal mode right?

because in prev pro subscription it has tooling pricing so that will make different, now we don’t have it

Sure 10k tokens is 10k tokens regardless of max or non Max.

The detailed answer is not so simple :slight_smile:

Here is an example:

Anthropic shows compute usage as tokens in API from what I know based on regular documentation they provide.

A standard AI API request usually has only 1 tool call. Cursor provides in Agent more than 1 tool call per ‘request’ (prompt). So there is some logic in their internal handling. I’m answering based on my experience with how AI works and not based on any direct info from Cursor Team.

Important is to know that even if you give the same exact prompt to the same AI model it wont do exactly the same each time. Comparing two requests on same model and same mode (non/thinking and/or non/Max) will result in two different token usages at the end. Similar maybe but different.

Total consumed tokens are combined from:

  • Input tokens (prompt/context , incl. reading from tools)
  • Output tokens (code and text output, incl. writing to tools)
  • Cache write (chat history needed by model to ‘remember’)
  • Cache read (chat history needed by model to ‘use in next step’)

So from what I know that would mean the same x tokens total used, depending on how input/output/cache-write or read tokens are used, it may result in different cost if there is any difference in spread of tokens over those four cost options.

Yes now Cursor plan does not charge tool calls separately but the actual resource usage that includes content from/to tool calls is still resource usage. It just doesnt matter how many tool calls were needed.

So if we simplify all this and assume that there is just a single tool call to read a file and another tool call to write a file with total of X tokens used at end of request, then it would be likely very similar in non-max and max mode. (In reality it isnt that simple)

ok got it
I already know how claude works, it also depends with various input/output/cache and I believe cursor do so

Im just wanna clarify whether max and normal mode
so with new subscription, the most different is max mode will allow us to use 200k contex in claude and normal mode 120k

so the conclusion is

if we have 10k usage token (same tool calling, input/output/cache)
there will no different between normal and max.

thanks for your explanation

1 Like