I tested Auto mode vs manual Sonnet 4.5 on 5 tasks of different complexity

There’s been a lot of discussion about whether Auto mode quietly picks cheaper models for complex tasks. I wanted actual data instead of vibes, so I ran a comparison.

What I did: Created 5 tasks at increasing difficulty (loading spinner → refactor → React debugging → architecture design → test suite debugging). Ran each one twice: once with Auto, once with Sonnet 4.5 manually selected. Same prompts, same project, same files.

Results:

Task Auto Sonnet 4.5 Faster
Loading spinner (simple) 21.6s 21.5s Tie
Refactor 200-line function (medium) 41.4s 36.9s Sonnet
Debug stale React data (complex) 26.0s 28.8s Auto
Architecture design 66.1s 83.4s Auto
Shared state test bug (reasoning) 44.6s 39.9s Sonnet

Auto was actually faster on the two hardest tasks. Output quality looked the same across all 5. Both solved everything correctly from what I could tell reviewing the generated code.

The one interesting difference was on the test debugging task. Auto moved the setup into a global beforeEach. Sonnet called .clear() on the shared state. Both valid, different approaches.

Big caveat: I ran this through the CLI, which doesn’t show which model Auto actually selected. So I can see outcomes were equivalent but can’t tell you the routing. If anyone knows how to pull that metadata I’d love to re-run this with visibility into what Auto is actually picking.

Other caveats: Small sample (5 tasks), clean TypeScript/React project, one language. A messy real-world codebase might behave differently.

Based on this though, the “Auto is cheating you” concern didn’t hold up. At least not for these task types.

Anyone else tested this? Curious if your results match or if there are specific scenarios where Auto clearly picks wrong.

2 Likes

If you are willing to run this again, I’d try with Composer 1.5

Why? It uses the auto pool, so your results would be consistent. If Composer 1.5 is equal to to Sonnet 4.5, then you could deem Sonnet 4.5 as “deprecated” for such tasks

i didnt include composer 1.5 because i wasnt sure how to isolate it in the model picker. it seems like its what Auto routes to but i cant confirm that. if theres a way to select it directly thatd make the comparison way cleaner. do you know if its exposed anywhere or only through Auto?

I can just pick it via the selector :face_with_spiral_eyes:

And yeah, I reckon auto is picking it by default as it’s probably ‘cheaper’ to run (and provide good results) internally than an external model – outside of Kimi