What thinking level is it actually, gpt-5-codex-medium?
High should be offered - much much much better
What thinking level is it actually, gpt-5-codex-medium?
High should be offered - much much much better
@ArtAndrew @leoing OpenAI API doesnât provide reasoning effort for gpt-5-codex, which means thereâs no low, medium, high, thereâs only gpt-5-codex.
Even if Cursor would want to add -high option, itâs just not possible because of OpenAI.
Maybe at some point OpenAI will provide reasoning effort, but hard to say.
I do think there are high medium low, at least in the codex-cli you can choose the effort you want for gpt-5-codex.
@sanjeed5 can you confirm whether there will be only one model for gpt-5-codex, or there will be high/medium/low? also from my experience with codex-cli the agent.md gpt-5-codex generated is very minimal, like below 50 lines for a 200k lines monorepo, and it works totally fine. so probably a short system prompt works better fox gpt-5-codex
Codex-cli goes by different law. The reasoning modes there work only through ChatGPT subscription, not through API. Cursor works only through API and this feature isnât available this way.
I see, thanks for clarification
Codex extension kicks â â â in react projects. I mean a lot of kicks.
I donât use agentic stuff, Iâm usually just asking it questions. For that use case, Iâm absolutely blown away by this model. Very noticeable uptick in quality over the others, yet it is much faster than even gpt-5-high-fast
This model seems to be able to reason about complex tasks very well and accurately knows what to look up in a codebase. I havenât had one blatantly wrong answer from it yet. It doesnât give lots of âplausible, but wrongâ type answers and cuts out on fluff. Responses are finally the length Iâve always wanted them to be. Other models wouldnât even cut down on length when you told them to (or had memories instructing them to).
I used GPT-5-codex with a very detailed requirements and development document. I clearly explained what needs to be done and it doesnât do what I ask. Some implementations are missing. Some things (such as writing tests) are completely ignored even if I ask it to do it explicitly multiple times. I couldnât make the model write tests. In addition, it always asks questions. This is sometimes good. It is good for a model to ask you questions before doing any development work but when I ask it to write tests with sufficient detail, it shouldnât just describe what needs to be done and ask me whether to implement it or not. Even if you instruct it to not ask questions and implement the tests it still does not do it. Only thing I get is some explanation and question.
Horrible model. Switched back to Claude 4 which handled everything in single prompt.
Why are so many complaining? I have the feeling it works pretty well actually?
Yes it maybe takes more time than gpt-5-high, but output seems great?
I noticed that GPT-5-Codex canât check auto lints compared to standard models
It does all the time for me using flutter analyze for me (but in codex cli).
gpt-5-codex struggles with user rules. i had to remove it otherwise it wouldnt know how to act.
The output of GPT-5-codex is great compared to GPT-5-high, medium, or lowâcodex actually follows instructions without missing anything. But itâs so, so slow.
It seems much closer to an autonomous software developer than any other model available on the planet. Itâs just slow, and still makes mistakes, and sometimes will take itâs own spin on things, requiring you to go back and tell it not do that. Itâs reliable, but slow.
I think people have a perception that itâs worse, because when you need to fix a prompt a rerun it twice, that takes 30 minutes with Codex, but 5 minutes with Sonnet 4.
It definitely is quirky though.
It was probably following your user rules too well. I found that same issue.
This. First model actually adhering to rules at all.