I ran a quick test: one messy Express file (hardcoded creds, duplicated auth, no structure), same prompt to all 5 Pro models. “Refactor this into a clean structure.” Nothing else.
Every model handled the basics fine. What I didn’t expect was all the stuff they added that nobody asked for.
What each model added unprompted
Opus caught that my JWTs had no expiry and added expiresIn: '24h'. It also fixed a status code bug (original used 400 for invalid tokens, should be 401) and created a .gitignore and .env file.
Sonnet caught the same JWT issue but made it configurable (JWT_EXPIRES_IN || '24h'). It also threw in a /health endpoint and a README. If you’re about to deploy behind a load balancer, that health check saves you a step.
GPT behaved in a way that felt architectural. Custom HttpError class, an asyncHandler wrapper (Express doesn’t catch async errors by default, so this one actually matters), and it split app.js from server.js for testability.
Gemini just did the refactor. Nothing extra added. It was also the slowest at ~4m 21s, which is hard to justify when the others add more and finish faster.
Auto picked Opus.
Timing
| Model | Time |
|---|---|
| Sonnet | ~60s |
| GPT | ~1m 21s |
| Opus | ~2m |
| Gemini | ~4m 21s |
Anyway
I’ve been defaulting to Sonnet for most things. Fast, practical additions. Opus when I care about security stuff. GPT if I want it to over-engineer things (sometimes that’s what you want).
The JWT expiry fix and the asyncHandler are things I would’ve missed in a real project, honestly. Worth diffing the output even on simple refactors.
I tested this with Cursor CLI agent mode. Input and outputs committed to git before/after each run.