Request Consumption Fairness Between GPT-5 Variants

JimKarter · August 13, 2025, 4:16am

I’ve noticed that GPT-5-low, GPT-5, and GPT-5-high have different “reasoning effort” levels (“low,” “medium,” and “high,” respectively), yet each request to any of them consumes exactly 2 request credits.

This feels unfair because the computational cost and output depth are clearly different. Users who choose GPT-5-low for lighter, faster tasks are still charged the same as those using GPT-5-high for deep reasoning, which discourages optimal model selection.

Could you consider adjusting the request cost to reflect the actual reasoning effort of each variant? For example, GPT-5-low could consume 1 credits, GPT-5 standard 1.5, and GPT-5-high 2.

This would better align pricing with resource usage, encourage thoughtful model choice, and improve overall user satisfaction.

webdo · August 13, 2025, 4:24am

The pricing is the same across all GPT-5 variants because the additional “reasoning effort” primarily generates more internal reasoning tokens rather than significantly increasing the core computational cost, so you’re essentially paying the same base price regardless of how much internal reasoning the model does before giving you the final response.

JimKarter · August 13, 2025, 6:09am

Well, then there is no reason to choose low and medium, that’s my concern.

JimKarter · August 13, 2025, 6:10am

Especially I’m still request-based so the pressure is more higher.

f00z · August 13, 2025, 6:11am

That’s not exactly how it works. When you do an API request to the LLM you can (in some models) tell it how many tokens to use for thinking. So it has a budget output of say 50k tokens, you can say use 20k for thinking, thus the output will be smaller 30k max. Or you can tell it to use 1k thinking and get 49k output. It’s still the same amount of tokens, the computational cost isn’t much different from a thinking token vs output token.
Since GPT doesn’t let you specify the exact token amount, they have this low/default/high setting which is good enough I suppose. Low is much faster. Try it out.

In other words, if you use ‘low’ you could get a lot more code output from it in one request vs high you’d get a lot of thinking but smaller output, does that make sense?

JimKarter · August 13, 2025, 7:04am

I’ve been using GPT-5-High and medium and low several hours already (each model 3-4 hours) but I feel differet.

I asked simple task to GPT-5-low like, fixing linter error but GPT-5-low reads 10 files then thinks too much and suddenly gave up everything or just ended up with explanation of how to fix and let humans to fix, instead of agentic work.

I don’t think token budget is on whatever the reasoning setting is; GPT-5 gave up too quicky and doesn’t do agentic work sometimes.

flicky · August 13, 2025, 9:00am

I use low-fast for this reason.

condor · August 13, 2025, 9:04am

Note that -fast consumes double of regular model as it is a priority queue at OpenAI.

flicky · August 13, 2025, 9:07am

That’s fine for the trial.

After trial and with Auto charging I’ll need to reassess my use of Cursor, period.

Topic		Replies	Views
Is there a practical difference between -low -std -high GPT-5? Discussions	8	1346	September 1, 2025
Gpt-4.5-preview each request consume 50x premium Discussions	6	263	May 14, 2025
GPT 5.5 - Out now! Release Discussions openai , linear-linked	38	4516	May 8, 2026
As an old user, I want more clarity from the new pricing Feedback	1	166	October 11, 2025
Enable Switching Between Fast and Slow Request Modes for Better Flexibility Feature Requests	10	2402	June 16, 2025

Request Consumption Fairness Between GPT-5 Variants

Related topics