Utilise the GPU?

Feature request for product/service

Cursor IDE

Describe the request

It might not be possible but I was playing with Illama using a custom model and it is so fast.
I also know the GPU has memory limits
I was wondering if Cursor could have a hybrid approach and use the GPU for chat/organising things and doing simple tasks which the remote LLM does the deep thinking.
This could speed things up significantly if it worked and allowed us to use deep thinking a lot more without breaking the bank

Operating System (if it applies)

Windows 10/11