faster and output (in my testing) on a similar level
Is it paid or free to try?
paid afaik
got to try it, then
Hi,
The flag makes find_family_for_model ignore whatever model slug you asked for and treat it as codex-experimental instead. That turns on everything tied to that family—experimental reasoning summaries, the
gpt_5_codex_prompt.md instructions, parallel tool calls, and the ability to expose apply_patch as a freeform tool. In short, it forces your Codex instance into the experimental mode no matter which model you
originally requested.
Eats money like Claude
I started the task on Cheetah, underestimating the difficulty. I saw the cost and stopped before he finished. I restarted on gpt-5, which finished the task cheaper (and most likely better).
very fast. made some infrastructure changes after inspecting my current set up through AWS CLI much faster and excellent accuracy, (it had and resolved its one syntax error in the few dozen commands it ran)
I wouldn’t be surprised if this model is from Cursor. I’ve seen and felt this type of performance and response when using the Auto model.
Despite it claiming to be Claude, I don’t think it’s from Anthropic. My bet is that it’s a Grok model from xAI that was heavily trained on Claude outputs to the point where it thinks it’s Claude. (It wouldn’t be the first xAI model to do that.)
The fact this model suggests it might be all kinds of things, makes it more believable that it’s Cursor’s own model trained on user data with multiple models output (with majority of it being Claude).
Xai has no reason to relase an overpriced model that’s not SOTA. But for a new provider that’s still gathering experience, I can imagine that much easier.
I am not quite sure what it all means and if it matters at all.
But I do have an account through my workplace and when I use this model it does say in the Kind column that it’s included and also after the price for it, it also says that.
But it says the same thing in other models as well, like GPT-5 and Claude models.
My Included Usage is said to be capped out according to my dashboard. But it has been for a day or two but I was still able to use GPT-5 and Claude 4.5. That is, without eating into my On-Demand Usage.
This happens because Cursor provides an unknown chaotic quota in excess of the paid quota.
In my tests, this model performed poorly. It might be a decent option for autocomplete, but the model can’t even handle search queries. It simply comes up with answers that don’t correspond to reality.
Yeah, I have done a few queries with it as well and sure it’s fast.. really fast! But it’s not nearly as accurate as Claude 4.5, GPT-5 or even Claude 4 Sonnet.
The only thing I am impressed by so far is its speed. But we need more than speed, we also need accuracy which seems to be lacking with this model.
I had a good experience with it, to me it feels like sonnet 4.5 level intelligence.
Sonnet is not a really accurate model, it often misses details or hallucinate stuff, i never rely on it to explain how some stuff work because i do not trust it.
This is all feelings and not quantitative testing but I was going to give this model a composite score based on speed and ability it would definitely be in the top 3. Have been using it extensively and it has been a joy to work with. Worth mentioning I am not a big fan of giving large massive chunks of work, I typically break work down by functions as I prefer a certain structure/design and it is so quick that I can stay in the flow.
Be careful.
VERY quick model. However, when asked to fix some failing tests… it decided the fastest way to fix them was to delete the file. For the most part, I was really happy with It.
Super fast. Does NOT give good options on how to fix things, though - when asked how to do something, gives only one answer.
This is a really fast model that executes commands correctly. I wonder who is behind it…
I wrote about 60,000 lines of code with this model, and the most important thing I can conclude from my experience is: First, talk to the model about the code you want to implement in “Ask” mode, then ask it to fulfill your requests in “Agent” mode.
Or do the code planning with another more powerful model (e.g., GPT-5-high), then ask this model to develop it.
Yeah I have since then had a look at the code and a bit more and it’s about turning on experimental features.
What you said. Plan first, act later. This model is absolutely perfect for it. I find gpt-5-codex is a model that does better without the split of Plan/Act. It may sound counter-intuitive, but models with a lot of reasoning need the space to think as they work. You end up using their probability distributions way more than if you give them a single path of action that may very well be wrong in the first place. Although I suspect there’s a work around for that, that I will try.