Gpt-5-codex discussion

What thinking level is it actually, gpt-5-codex-medium?

High should be offered - much much much better :slight_smile:

@ArtAndrew @leoing OpenAI API doesn’t provide reasoning effort for gpt-5-codex, which means there’s no low, medium, high, there’s only gpt-5-codex.

Even if Cursor would want to add -high option, it’s just not possible because of OpenAI.
Maybe at some point OpenAI will provide reasoning effort, but hard to say.

1 Like

I do think there are high medium low, at least in the codex-cli you can choose the effort you want for gpt-5-codex.

@sanjeed5 can you confirm whether there will be only one model for gpt-5-codex, or there will be high/medium/low? also from my experience with codex-cli the agent.md gpt-5-codex generated is very minimal, like below 50 lines for a 200k lines monorepo, and it works totally fine. so probably a short system prompt works better fox gpt-5-codex

Codex-cli goes by different law. The reasoning modes there work only through ChatGPT subscription, not through API. Cursor works only through API and this feature isn’t available this way.

1 Like

I see, thanks for clarification

Codex extension kicks ■■■ in react projects. I mean a lot of kicks.

I don’t use agentic stuff, I’m usually just asking it questions. For that use case, I’m absolutely blown away by this model. Very noticeable uptick in quality over the others, yet it is much faster than even gpt-5-high-fast

This model seems to be able to reason about complex tasks very well and accurately knows what to look up in a codebase. I haven’t had one blatantly wrong answer from it yet. It doesn’t give lots of “plausible, but wrong” type answers and cuts out on fluff. Responses are finally the length I’ve always wanted them to be. Other models wouldn’t even cut down on length when you told them to (or had memories instructing them to).

1 Like

I used GPT-5-codex with a very detailed requirements and development document. I clearly explained what needs to be done and it doesn’t do what I ask. Some implementations are missing. Some things (such as writing tests) are completely ignored even if I ask it to do it explicitly multiple times. I couldn’t make the model write tests. In addition, it always asks questions. This is sometimes good. It is good for a model to ask you questions before doing any development work but when I ask it to write tests with sufficient detail, it shouldn’t just describe what needs to be done and ask me whether to implement it or not. Even if you instruct it to not ask questions and implement the tests it still does not do it. Only thing I get is some explanation and question.

Horrible model. Switched back to Claude 4 which handled everything in single prompt.

Why are so many complaining? I have the feeling it works pretty well actually?

Yes it maybe takes more time than gpt-5-high, but output seems great?

2 Likes

I noticed that GPT-5-Codex can’t check auto lints compared to standard models

It does all the time for me using flutter analyze for me (but in codex cli).

gpt-5-codex struggles with user rules. i had to remove it otherwise it wouldnt know how to act.

The output of GPT-5-codex is great compared to GPT-5-high, medium, or low–codex actually follows instructions without missing anything. But it’s so, so slow.
It seems much closer to an autonomous software developer than any other model available on the planet. It’s just slow, and still makes mistakes, and sometimes will take it’s own spin on things, requiring you to go back and tell it not do that. It’s reliable, but slow.

I think people have a perception that it’s worse, because when you need to fix a prompt a rerun it twice, that takes 30 minutes with Codex, but 5 minutes with Sonnet 4.
It definitely is quirky though.

It was probably following your user rules too well. I found that same issue.

1 Like

This. First model actually adhering to rules at all.