Cursor has an opportunity being the arbitrator of model selection that is currently uninspiring. The pricing challenges and community uproar in the topic Pricing Megathread speak volumes to both a problem and opportunity to set cursor apart from the commodity market.
There is a big gap between model capability and auto mode could be a big win if that mode were utilized to use the best thinking model to supervise the less capable auto mode ChatGPT or even ollama local models.
An example easily comes to mind in that if I use auto today and give the model a written document so there no ambiguity with, for example, how to address fixing a test, the auto model often ignores the fact that the implementation should not be changed to satisfy a defect and it should stop and get direction or refer to the requirements.
Instead the auto model will put mocks in the implementation or whatever is the fastest, but least beneficial way, to address that test. Then when you ask the model, "did you see that rule about doing x but not y, it says yes I shouldn’t have done that.
Whenever I see the auto model chat stream going in a direction where the model is about to do something stupid or against the rules I just provided I have to quickly stop the auto model and switch to a capable model - if I can read fast enough! Imagine if the chat stream were going at two times the speed, it would be impossible for me to intervene.
Ideally, there would be a supervisor model that was Strategy model ($$$), a Boundary model ($$) a worker auto model ($) that did the work. The Strategy model would take written (pinned requirements) documents and formulate next steps that are optimized to stay within requirement but plan next steps (these are currently the task list). Then the Boundary model would be set up to keep the worker auto in line and restrict it from going off the road.
Cost is a real concern and the only way to balance cost and quality is to have a tiered execution chain. As the performance improves there is no way the user can be the supervisor or the guard rails. And without this leverage approach the user just gets frustrated and walks away because of cost and quality.
The pinned requirements (doc’s) is a necessary step in this execution optimization. Without it, context is quickly lost. I’m seriously at a point where I have to use two tools, one for complex tasks and one for simple editing but that approach is simply a stop gap and I’m always looking for something better.