I prefer GPT 5.5 but it needs more detailed prompt and I start a new chat after 1-2 messages every time but is much more consistent and less likely to miss obvious mistakes, but slightly more likely to over engineer solutions
Use multiple tasks for backend infrastructure, not all in one with subagents
I used to prefer Claude models but switched with GPT 5.4 release and it was a great choice.
GPT-5.5 is twice as expensive as GPT-5.4 due to the doubling of the input cost. I doubt it’s twice as smart. Also, I had issues with the Cursor cache (or it really is TOO EXPENSIVE).
I saw a lot of negativity on 4.7 on various social media platforms. Although it worked fine for me during the launch discount.
Both models seem overpriced to me. I’d stick with GPT-5.4 for complex/important tasks, GPT-5.3 Codex/GPT-5.1 for medium tasks, and Composer 2 for simple tasks.
Sonnet 4.6 Max for debugging.
First thing — I never use Max (Opus 4.7 Max or GPT 5.5 Max) and I don’t recommend it. Tried it on Cursor, it’s expensive and most of the time not worth it. i use Opus 4.7 only when 50% off same with GPT 5.5.
I stick with GPT 5.5 Medium — it runs at 50% rate, so it’s much friendlier on your included usage. Attaching a screenshot of my actual setup so you can see.
My daily setup:
→ GPT 5.5 Medium → planning
→ GPT 5.5 Low Think / Composer 2 → execution
→ Codex 5.3 / GPT 5.4 / Composer 2 → most other tasks
Sometimes we don’t need a Smart model for every task. Try to learn from your own experience which and when to use Frontier models. No need to pay for everything — and definitely no need to jump straight to Max.
On workflow — this is the approach I use:
→ Create PRD
→ Create RFC based on the PRD
→ From PRD you get user stories, right? Each user story becomes a single task
→ Work on it in a single chat
→ Bug? Fix it there
→ Context almost full? New chat, reference the past chat
Rule of thumb: 1 Task, 1 Chat. And always use TDD — test-driven so you can expect what may fail and what defines success.
On Flutter / FastAPI specifically — I’m mostly on React/Next.js so I can’t speak from direct experience there. But the model selection + workflow logic above should still hold regardless of stack.
Opus-4.7 is very, very good for me its worth the expense. Better than GPT-5.5 in my experience which is also expensive but now getting a lot of usage by me right now thanks to price savings.
Great share. Question, are you simply manually toggling between 5.5 medium and low think? I was wanting to ping different versions for easy model switching but looks like no way that I know of exists.
I wanted to ask about SpecKit / spec-driven development — has anyone tried it for large projects in Cursor? Does it help reduce hallucinations and improve consistency?
Also, do you think using a single strong model like Opus with this workflow is a good approach, or would it be a waste of tokens in practice?