I tested Auto mode vs manual Sonnet 4.5 on 5 tasks of different complexity

nedcodes · February 15, 2026, 5:33pm

There’s been a lot of discussion about whether Auto mode quietly picks cheaper models for complex tasks. I wanted actual data instead of vibes, so I ran a comparison.

What I did: Created 5 tasks at increasing difficulty (loading spinner → refactor → React debugging → architecture design → test suite debugging). Ran each one twice: once with Auto, once with Sonnet 4.5 manually selected. Same prompts, same project, same files.

Results:

Task	Auto	Sonnet 4.5	Faster
Loading spinner (simple)	21.6s	21.5s	Tie
Refactor 200-line function (medium)	41.4s	36.9s	Sonnet
Debug stale React data (complex)	26.0s	28.8s	Auto
Architecture design	66.1s	83.4s	Auto
Shared state test bug (reasoning)	44.6s	39.9s	Sonnet

Auto was actually faster on the two hardest tasks. Output quality looked the same across all 5. Both solved everything correctly from what I could tell reviewing the generated code.

The one interesting difference was on the test debugging task. Auto moved the setup into a global beforeEach. Sonnet called .clear() on the shared state. Both valid, different approaches.

Big caveat: I ran this through the CLI, which doesn’t show which model Auto actually selected. So I can see outcomes were equivalent but can’t tell you the routing. If anyone knows how to pull that metadata I’d love to re-run this with visibility into what Auto is actually picking.

Other caveats: Small sample (5 tasks), clean TypeScript/React project, one language. A messy real-world codebase might behave differently.

Based on this though, the “Auto is cheating you” concern didn’t hold up. At least not for these task types.

Anyone else tested this? Curious if your results match or if there are specific scenarios where Auto clearly picks wrong.

RawToast · February 16, 2026, 1:26am

If you are willing to run this again, I’d try with Composer 1.5

Why? It uses the auto pool, so your results would be consistent. If Composer 1.5 is equal to to Sonnet 4.5, then you could deem Sonnet 4.5 as “deprecated” for such tasks

nedcodes · February 16, 2026, 2:38am

i didnt include composer 1.5 because i wasnt sure how to isolate it in the model picker. it seems like its what Auto routes to but i cant confirm that. if theres a way to select it directly thatd make the comparison way cleaner. do you know if its exposed anywhere or only through Auto?

RawToast · February 25, 2026, 12:12am

I can just pick it via the selector

And yeah, I reckon auto is picking it by default as it’s probably ‘cheaper’ to run (and provide good results) internally than an external model – outside of Kimi

Topic		Replies	Views
Same refactor, 5 models, very different "extras" Discussions auto-mode , cli , openai , anthropic , gemini	3	107	February 20, 2026
Auto seems to select heavy thinking models regularly? Help auto-mode , composer	2	117	February 16, 2026
Best Practices for Model Selection: Is 'Auto' mode reliable for complex full-stack tasks? Discussions auto-mode , anthropic	3	209	January 24, 2026
[Question] Guidance needed: When should I choose Composer 1.5 over larger models like Opus or GPT? Help anthropic , composer	2	1836	February 11, 2026
Auto Mode or Composer 1.5? Discussions auto-mode , composer	5	519	March 6, 2026

I tested Auto mode vs manual Sonnet 4.5 on 5 tasks of different complexity

Related topics