Add Support for Model Picker for Full File Inline Edit

I often prefer the inline chat UI. Often my requests don’t benefit from watching the llm stream it’s responses as it “plans” or “thinks” – rather I just want to see the proposed changes ASAP inline so I can apply them or see them in context.

Unfortunately giving the “full file” scope to the inline chat removes the ability to choose the model and often it feels like this model is quite a lot dumber than options available in a chat.

Can we select the model for inline?

I would also say that the inline model feels a bit slow to go from prompt to changes. I have a feeling that is because the model is actually streaming all the planning and thinking back to cursor, but the UI is just not showing it. Could this be improved to get the models to skip the thinking?

Hey, thanks for the suggestion.

While I don’t personally know why this isn’t available, it seems like a good feature to have.
I have passed this to the team to look into!

1 Like

Thanks buddy!

Hey, spoke to the team on this one.

This is actually done on purpose as a full file edit can be slow and not the smoothest with certain models, so we make the opinionated decision to fix the model for you.

While this does unfortunately mean it’s slightly less configurable, this does have the benefit that full-file edits like this have a tiny impact on rate limits vs the same edit done via the Agent in the chat.

Thanks for getting back to me!

Interesting, I appreciate that the default should be auto in this case, but given that the point of being able to choose the model in chat etc is to be able to adapt the IDE so that it delivers the best results for the task/code-base, I think that it would be cool of you reconsidered this. For example, what might be “not the smoothest” to you in your codebase might be a great fit for me in my codebase.

The other thing I wonder if you are considering is that the chat stream is incredibly visually noisy if you are being efficient and continuing to think/work while the LLM is generating your changes. Using the inline chat to avoid showing all the thinking is a UX improvement when you know you just want to quickly get some results, even if it takes the same amount of time.

Which brings me to my other question, I wondered if you had thoughts about improving the system prompt, or allowing the user to configure the system prompt for just the inline full-file mode to be faster?