Which LLMs are the best bang for your buck? Re: Large plan execution, refactoring, test suite updates, etc

I usually use Sonnet 4.5 (Thinking), but this last billing cycle I ran out of tokens really fast. I’m curious what everyone’s favorite model is for execution that you think needs a little more juice than what Auto provides. To what extent does the TYPE of work you need dictate the model you use? For instance, is refactoring better with X model but executing large plans better with Y model? Simple question, probably even a noob question… oh well. :slight_smile: TIA.

1 Like

For small task always

  • GPT 5 Codex Mini / Auto for reasoning
  • Grok Code for Execution

For Complex Task

  • Codex CLI for plan / Execution
1 Like

Try this rule. Since using this rule, it has saved me a lot of tokens. Usually you will have to pay output fees higher than input fees, and the input is cached while the output you will pay 100% of the fee. Claude often explains very detailed and long even when not asked so your output fees are often very high.

:high_voltage: ACTION > TALK/EXPLAIN:

→ Agent can think deeply/long, but TEXT RESPONSE (talking/explaining) must be as concise as possible
→ This does NOT affect code quality - only agent’s verbal output should be minimal
→ Only explain in detail WHEN USER EXPLICITLY REQUESTS

Composer is excellent at bug solving and fast execution. I’ve enjoyed using Opus during the low cost trial, but happy to stick with Composer for most tasks.

Gemini is my go-to for UI changes though