I am no longer able to chat with my own self-hosted model in “Ask” mode due to “Unauthorized User API key” errors. For a couple of days ago, it worked just fine, now it does not.
The custom model with the associated key works fine at other places, such as Cline/Kilo Code.
Steps to Reproduce
If required, I can provide separately the custom OpenAI endpoint and the API key for testing.
Expected Behavior
In “Ask” mode, I should be able to chat with my own model that is hosted on a custom endpoint that implements OpenAI API specification.
Operating System
Windows 10/11
Current Cursor Version (Menu → About Cursor → Copy)
I haven’t noticed any changes when using my proxy which also implements the OpenAI API, with the only difference that it uses model names like gpt-[high/medium/low/minimal] instead of the reported gpt-4o.
I’ve checked with 1.7.44 and 1.7.46, and Cursor keeps doing the same POST /chat/completions requests as usual and it works with ask/agent/custom, which makes me think the issue is more related with the proxy.
@syllil Are you using a closed source proxy, or is it something you can share?
@syllil Thanks for sharing, interesting service! The issue seems to be that Azure stopped supporting the /completions API for gpt-4o (and any other models except gpt-3.5):
> curl https://openai.softronic.ai/v1/completions -H “Content-Type: application/json” -H “Authorization: Bearer $OPENAI_API_KEY” -d ‘{
“model”: “gpt-4o”,
“prompt”: “Say this is a test”,
“max_tokens”: 7,
“temperature”: 0,
“stream”: true
}’
{“error”:{“code”:“OperationNotSupported”,“message”:“The completion operation does not work with the specified model, gpt-4o. Please choose different model and try again. You can learn more about which models can be used with each operation here: https://go.microsoft.com/fwlink/?linkid=2197993.",“type”:null,“param”:null,"inner_error”:null}}
The only model from azure/softronic that seems to work with /completions is gpt-3.5-turbo-instruct. And neither softronic nor Cursor support BYOK for models that are only served through the /responses API.
Hence the existence of my proxy to serve /responses-only models (such as GPT-5) through the old /completions API which is supported by Cursor.
For this to work, it would require Softronic to implement a proxy similar to mine which would be able to keep serving models through /completions indefinitely.
In case you work at softronic, I’m open to freelance opportunities, or feature sponsors for my project! My email is me at gabrii dot com