Statistics for GPT-5.4, High, and Extra High modes

Feature request for product/service

Describe the request

Hello!
There’s a general description of GPT-5.4 on GPT-5.4 | Cursor Docs , but I’m missing information about the differences between GPT-5.4, High, and Extra High modes. Could you please tell me if there are any publicly available statistics/metrics (accuracy, speed, cost, latency, etc.) for these three variations? I think Extra High is too expensive for most tasks.

Just use medium. Cheap, fast and smart.
Introducing GPT-5.4 | OpenAI

1 Like

Hey, good request. High and Extra High are GPT-5.4 reasoning effort levels. They control how much thinking the model does before replying, which directly affects latency, output quality, and cost.

Right now our docs page covers GPT-5.4 at a high level: GPT-5.4 | Cursor Docs
Pricing details are here: https://cursor.com/docs/models-and-pricing

For detailed benchmarks comparing different reasoning effort levels, the OpenAI docs are the best source. They publish the accuracy vs speed tradeoffs by effort level. The link @jes shared is a good starting point.

In short, higher reasoning effort means more reasoning tokens, so higher cost and latency, but potentially better results on hard tasks. For most coding tasks, the default mode or High usually gives a good balance.

That said, fair point that our docs could do a better job explaining these modes and when each one makes sense.

2 Likes