I tested the opposite. On a 2000-file codebase, I wrote system prompts that hard-prohibit Opus from reading files during diagnosis and planning sessions. It must dispatch subagents (cheaper fast models) to explore, map, and draft — then synthesize their short structured returns.
Every Opus row is under 750K. The large rows are the cheap fast model doing exploration.
Quality did not regress. Plans are more executable, not less. Diagnosis is sharper, not vaguer. The reason: when forced to delegate exploration, Opus spends its tokens on what it’s actually better at — framing the problem, resolving ambiguity, and making architectural decisions. When left unconstrained, it burns most of its context on grep-and-read tours any cheap model could do.
The token math: A subagent that maps 20 files and returns 300 words costs ~10K tokens on a fast model. Opus reading those same 20 files inline costs ~500K-1M tokens — and that content then rides in context for every subsequent turn, compounding. That compounding was the real leak.
The specific failure mode this prevents: Opus reasons “I need to read the spec doc to understand scope” → opens a 1000-line file → triggers 5 more reads → 7.5M tokens before any actual work starts. That row is in my logs at 2:52 PM.
The rule that matters: Dispatch is the reversible decision. Inline reading is not. When in doubt, dispatch — the downside is 10K tokens of overhead; the upside is avoiding a 5M-token session.
