So I’ve attached a screenshot of the standard models.
My gripe (and I’m not trying to pick fights, but hope to get actual info) is that 1 word comments like “medium” or “high” and tiny descriptors are not exactly informative.
For example: codex 5.3 is described as “good for ambitious coding tasks” and GPT-5.5 is “Great for complex tasks”. Sonnet 4.6 is “…great for difficult tasks”. These are all the same sentence, just rephrased.
I typically use gpt models because claude annoyed me back in the day. I prefer Grok (which I use to “audit” cursor llms)
Can anyone share their experiences with any of these models? Or any models in particular? I’m leaning towards trying out Opus 4.7 because of the 300K context window, but don’t trust Cursor’s 1 lack of pricing descriptor. I’m a little frustrated.
The unfortunate truth here is that your experience with the models will vary greatly depending on your prompting style, project, and overall coding experience. Personally, I think Gemini 3.1 Pro might be the best “bang for your buck” right now of the premium models, but it’s definitely not as capable as GPT-5.5 (the current king). Check out the pricing page and sort the premium table by cost. If you’re using GPT-5.5 or Opus 4.7 for basic requests like “Can you update the CSS file and reduce color saturation of all colors by 5%”, then you are effectively renting an excavator to dig a small hole: it’ll do it, it’ll do it quickly, but darn is it gonna cost you.
One tip I’ll share: Codex 5.3 Spark (Extra High) is FREE and is really solid for small edits (maybe even outperforming Composer 2). Consider using Medium-Large models for planning and then manually chunk the plan into straightforward steps that you can feed to the dumber models for free. This has kept my Pro+ plan struggling along even with the recent price increases.
Hey, valid feedback about the descriptions in the picker. They’re short by design since there’s physically not much space there, but the docs have separate pages for each model with a lot more context like strengths, behavior, and nuances:
Warren’s tip about “pick the right model for the task” really works. Opus 4.7 and GPT-5.5 are great for big architecture work, but for small edits like CSS tweaks you’re overpaying by a lot. For that, Codex 5.3 or Composer 2 is usually enough.
On Opus 4.7 and 300K, try it on a specific task that actually needs large context like refactoring a big module or reviewing a large diff. For normal requests, you probably won’t feel much difference vs Sonnet 4.6, but you’ll pay noticeably more.
Hey, @Warren_James This is interesting. I’m looking at Codex 5.3 Spark. But I see it listed as Medium not extra high. AND context window is 128k. Just checking with you on it. I don’t think there are 2 codex 5.3 sparks lol?
Also @deanrie I notice my llm options don’t include composer 2. Instead I only see composer 2 (fast) . Is this common?
Grok 4.20 doesn’t even have a descriptor?! And, I’m noticing grok 4.20 vs composer. Why is grok 4.2 comparable to composer 2 (fast) in input and cache read and also actually less expensive in output? Like what? Doesn’t quite make sense…
Codex 5.3 Spark (Medium, 128k): what you see in the picker is correct. Spark is the Codex 5.3 variant with low reasoning effort. It shows as Medium, has a 128k context, and it’s in the Auto+Composer pool, basically free for Pro. There aren’t two Sparks. I think Warren meant Codex 5.3 on high effort, not Spark, and just mixed up the terms. Spark is good for small edits because it’s cheap and fast, not because it’s Extra High.
Composer 2 vs Composer 2 (fast): regular Composer 2 is still there, it’s just behind a toggle. Hover Composer 2 in the picker, click Edit, then turn off Fast Mode to get regular Composer 2. @kevinn showed it here: Composer 2 Unavailable in the Agent Window - #7 by kevinn
Grok 4.20 with no description: yep, that’s a UI gap, I’ll pass it to the team. The model description is in the docs: Grok 4.20 | Cursor Docs
Grok 4.20 price vs Composer 2 (fast): the logic isn’t cheaper = worse. Composer 2 is our in-house model, optimized for agentic coding in Cursor like tool use, precise file edits, and terminal work. It lives in the Auto+Composer pool with its own economics. Grok 4.20 is billed at the provider’s API rates. Comparing them purely by input/output price doesn’t mean much since they have different strengths, context, and behavior on agent tasks. Best way to choose is to try both on your typical tasks and compare the results, not the price tag.
I use the following models for full-stack development:
Composer 2 for ASK mode, escalate to other models based on answers if necessary
Opus 4.7 for plan mode (hand-off to agents is usually to Composer 2)
Composer 2 for Agent mode
Opus 4.7 for Debug mode (front-end)
Opus 4.7 or Codes 5.3+ for backend and/or terminal stuff
GPT 5.5 for visual understanding of screenshots
I hardly use auto mode, because the results are too unpredictable. I’m not using Fast or Spark modes because they’re too expensive. I use specialized skills to get to a unambiguous and detailed plan before handing it over to Composer 2. T his seems to work well.
I’m usually in medium or high effort modes of relevant modes, i escalaate when necessary, but this is rare. I use GitNexus as a knowledge graph, and Context7 to retrieve documentation, it works well.