TL;DR:
- Model does worse with thinking which cannot be disabled
- Thinking obviously worsens the rate limit issue, making the best coding model ever created even less usable in Cursor.
Since I’ve updated, the model is only available with thinking mode turned on. In my experience, thinking often makes the model self-sabotage function calls. It will “think” it made a function call when in reality it only thought about it.
The decision also doesn’t make any sense in terms of rate limits – why enforce (often unwanted) thinking mode when you’re already constantly out of quota with this model?
By saving thinking tokens, you save developers time, IMO.