The question is more about the general work. My Claude is several times more expensive Usage than gpt-5, despite the fact that they hardly have the same difference in API.
Early Sonnet 4.5 Review:
It seems to not act like an AI, in the way Sonnet 4 did. It acts almost human–it’s far more likely to do the change in the same process/way that I would do it myself. Straightforward–doesn’t waste time. Simultaneously, major improvements to code writing from scratch–does it properly, like a human, as well. To sonnet 4, it’s kind of like what gpt-5-codex is to gpt-5 if you get what I mean.
These benchmark results can be misleading, especially when published by the model’s own company, bias is inevitable. At the end of the day, it’s more about selling you than proving real-world coding performance.
Update after Further use of Sonnet 4.5:
It has a strong aversion to chat summarization; It’s very aware of it’s token usage, sometimes to the detriment of output quality–it tries to finish the task before summarization kicks in automatically. I’ve observed it going step by step through the todos (as instructed), but once it gets to around 80% of token usage, it says something like “Due to length constraints, let me focus on the most critical remaining tasks.” then rushes through the final tasks. I think Agent design needs to be improved for 4.5 specifically? Cursor-side, that is–please tweak the threshold such that it properly summarizes earlier around 80% token usage–or at least the agent understands the summarization, and doesn’t rush. Ultimately, requires more careful usage of manual /summarization.
In cursor, a single use of Sonnet-4.5 conversation only deducts the number of times multiple conversations are used after adding content to the message queue and receiving feedback from MCP!
After using message queue and feedback MCP to add information, sonnet-4.5 in cursor will now deduct more conversation times. Previously, gpt5 was counted as one conversation, but now it directly deducts several times the consumption.
As they say, if you don’t use it as thinking, it costs the same as Sonnet 4. However, calculating the thinking model directly as 2x is extremely expensive