Request Consumption Fairness Between GPT-5 Variants

I’ve noticed that GPT-5-low, GPT-5, and GPT-5-high have different “reasoning effort” levels (“low,” “medium,” and “high,” respectively), yet each request to any of them consumes exactly 2 request credits.

This feels unfair because the computational cost and output depth are clearly different. Users who choose GPT-5-low for lighter, faster tasks are still charged the same as those using GPT-5-high for deep reasoning, which discourages optimal model selection.

Could you consider adjusting the request cost to reflect the actual reasoning effort of each variant? For example, GPT-5-low could consume 1 credits, GPT-5 standard 1.5, and GPT-5-high 2.

This would better align pricing with resource usage, encourage thoughtful model choice, and improve overall user satisfaction.

The pricing is the same across all GPT-5 variants because the additional “reasoning effort” primarily generates more internal reasoning tokens rather than significantly increasing the core computational cost, so you’re essentially paying the same base price regardless of how much internal reasoning the model does before giving you the final response.

Well, then there is no reason to choose low and medium, that’s my concern.

Especially I’m still request-based so the pressure is more higher.

That’s not exactly how it works. When you do an API request to the LLM you can (in some models) tell it how many tokens to use for thinking. So it has a budget output of say 50k tokens, you can say use 20k for thinking, thus the output will be smaller 30k max. Or you can tell it to use 1k thinking and get 49k output. It’s still the same amount of tokens, the computational cost isn’t much different from a thinking token vs output token.
Since GPT doesn’t let you specify the exact token amount, they have this low/default/high setting which is good enough I suppose. Low is much faster. Try it out.

In other words, if you use ‘low’ you could get a lot more code output from it in one request vs high you’d get a lot of thinking but smaller output, does that make sense?

I’ve been using GPT-5-High and medium and low several hours already (each model 3-4 hours) but I feel differet.

I asked simple task to GPT-5-low like, fixing linter error but GPT-5-low reads 10 files then thinks too much and suddenly gave up everything or just ended up with explanation of how to fix and let humans to fix, instead of agentic work.

I don’t think token budget is on whatever the reasoning setting is; GPT-5 gave up too quicky and doesn’t do agentic work sometimes.

I use low-fast for this reason.

Note that -fast consumes double of regular model as it is a priority queue at OpenAI.

That’s fine for the trial.

After trial and with Auto charging I’ll need to reassess my use of Cursor, period.