Hi @_9056372,
The request consumption you’re seeing with Max Mode is expected behavior for legacy request-based plans. Here’s how it works:
Non-Max requests count as 1 fixed request regardless of token cost. Max Mode requests use token-based billing, where each API call’s actual token cost is converted into request-equivalents. Since frontier models like Claude Opus 4.7 with extended thinking are expensive per-token, and agentic tasks make many sub-requests (tool calls, thinking steps, applies), the total request-equivalents add up quickly.
The combination of (a) many sub-requests per task and (b) high per-token cost with extended thinking at high effort is what creates the large numbers you’re seeing.
To answer your questions:
-
Rationale: Colin explained this in the Frontier Models Max Mode megathread: as models become more capable, a single request can vary widely in cost, so fixed-per-request pricing no longer reflects reality.
-
1M context without Max Mode: The 1M context window currently requires Max Mode. Colin has confirmed there are no plans to change this. Anthropic’s removal of the long-context premium has been reflected (no more 2x multiplier above 200K).
-
Conserving requests: For tasks that don’t need 1M context, using non-Max models with 200K context will count as fixed requests. You can also reduce thinking effort (from xhigh to high or medium) to lower per-request token cost in Max Mode.