I can use Opus 4.5/Sonet 4.5 thinking at a cheaper price

DemonVN · December 11, 2025, 8:43am

Hi everyone,

We will have to pay for both input and output. Input is usually cheaper than output, but input is cached while output is not. This means you will pay 100% token fees for output. So, if we instruct the Agent to minimize output, we will save a lot of tokens. The rule I created aims to minimize the Agent’s output while maintaining maximum reasoning/logic/code/intelligence, and avoiding wasteful token behaviors such as summarizing tasks after completion, creating unnecessary .md and .txt files, and providing lengthy and detailed explanations even when not requested.

import this rule and all Agent will save token maximum for all you:

ACTION-FIRST PRINCIPLE

MAXIMUM QUALITY + MINIMUM VERBOSITY:

MUST BE MAXIMUM:
Thinking depth & reasoning quality
Code quality & logic accuracy
Problem analysis & solution design
Strategic planning & risk assessment

MUST BE MINIMUM:
Text responses (talking/explaining)
Verbose descriptions
Unnecessary elaboration
Redundant explanations

RULE: Think DEEPLY, Speak BRIEFLY
→ Only explain in detail WHEN USER EXPLICITLY REQUESTS

TOKEN OPTIMIZATION RULES

CORE PRINCIPLE: Maximum Intelligence, Minimum Words

WHAT TO MAXIMIZE (Never compromise):
→ Thinking depth & cognitive effort
→ Code quality & correctness
→ Logic & reasoning accuracy
→ Solution completeness

WHAT TO MINIMIZE (Aggressively reduce):
→ Verbal output in responses
→ Explanations (unless asked)
→ Lists, summaries, descriptions
→ Any “filler” text

DOCUMENTATION:
NEVER create .md/README files unless explicitly requested

RESPONSE STYLE:
→ Execute → Confirm briefly → Done
→ NO unnecessary elaboration
→ User will ask if they need details

BROWSER INTERACTION:
When browser encounters block/captcha → STOP, INFORM user, REQUEST access

wamalalawrence · January 27, 2026, 8:42am

thanks for sharing

RecLord · January 28, 2026, 10:27am

Thank you I will use this

neverinfamous · January 28, 2026, 11:24am

I wonder if talking through things helps the agent.

liquefy · January 28, 2026, 1:00pm

I wonder if this actually helps? It seems this only forces agent to not write long responses, which saves just a few tokens?

neverinfamous · January 28, 2026, 1:58pm

Less reading is nice, though, regardless of tokens. But I worry it could reduce effectiveness.

DemonVN · January 29, 2026, 4:35am

It’s actually effective in terms of saving tokens. Read the rules carefully; I only ask the Agent to “talk” less, but to maximize the quality of their thinking. Basically, they should think thoroughly to solve the problem and then report back as concisely as possible.

The second reason is that I’m very dissatisfied with Agent responses that are too long, wasting my time reading and understanding them. I only want long and detailed responses when I request them.

Topic		Replies	Views
Cursor "Auto" is no-longer unlimited Discussions	26	4585	March 11, 2026
Great business Model: Agent breaks. Agent eventually fixes. Pay both ways Discussions	7	151	January 21, 2026
Auto mode suddenly horrificly bad Feedback auto-mode	20	331	March 19, 2026
Where can I see REAL TIME usage against my monthly subscription plan? Feedback	35	24214	February 4, 2026
> Security: MCP tools with long-polling + alwaysApply rules enable infinite conversation loops, bypassing usage limits Bug Reports mcp , rules , security	2	171	April 4, 2026

I can use Opus 4.5/Sonet 4.5 thinking at a cheaper price

ACTION-FIRST PRINCIPLE

TOKEN OPTIMIZATION RULES

Related topics