Fresh bugs with custom model

Where does the bug appear (feature/product)?

Cursor IDE

Describe the Bug

Hi!
I’m using the GLM-4.7 model with the OpenAI endpoint and API key replaced. I understand this is some hack, but you promised support for custom models.
First, why can’t I use other models: Claude, Gemini, Composer, Grok, Kimi, etc., if I changed just the endpoint for OpenAI, it’s ok if ChatGPT isn’t available in this case, but not other models. This is really strange, as much as the impossibility to add any model I want, maybe OpenRouter, or Qwen, or anything else.
But today the situation has become much worse. I can’t use a custom model anyway at all: I got an error “Invalid model. The model GLM-4.7 does not work with your current plan or api key.”
I have a Pro Plus subscription. And now I’m hardly considering changing the IDE because of these issues. I have no clue why do you limit paying customers to use only built-in models, just to resell tokens?

Steps to Reproduce

Use a custom model with API key and endpoint entered in OpenAI requisits

Operating System

Windows 10/11

Current Cursor Version (Menu → About Cursor → Copy)

Version: 2.3.35 (user setup)
VSCode Version: 1.105.1
Commit: cf8353edc265f5e46b798bfb276861d0bf3bf120
Date: 2026-01-13T07:39:18.564Z
Electron: 37.7.0
Chromium: 138.0.7204.251
Node.js: 22.20.0
V8: 13.8.258.32-electron.0
OS: Windows_NT x64 10.0.26200

Does this stop you from using Cursor

Yes - Cursor is unusable

Hey, thanks for the report.

This is a known issue. When “Override OpenAI Base URL” is enabled, it affects all API keys and models, including Cursor’s built-in models (Claude, Gemini, etc.). The team is working on a fix, but for now here’s a workaround:

  • Turn off “Override OpenAI Base URL” when you want to use Cursor’s standard models
  • Turn it back on only when you need GLM-4.7
  • Switch it manually depending on which model you’re using

A similar issue was discussed here for Anthropic: Anthropic models break when Override OpenAI BaseUrl is set

And specifically for GLM-4.7: Cursor Models Fail When Using BYOK OpenAI Key with Overridden Base URL (GLM-4.7)

I’ll pass your details to the team.

Hi! Thank you for the answer.
But we have two issues here. The first one is when other models are overwritten, the second is “Invalid model. The model GLM-4.7 does not work with your current plan or api key”. The second issue makes Cursor totally unusable with an external model.

Also if follow walkaround for the first issue - each time when I need to enable custom model, I have to input endpoint manualy again - it doesn’t save

Yes, the second issue (the endpoint not being saved when you toggle the switch) is also a known bug. The team is aware and is working on fixing the whole base URL override system.

Unfortunately, the current workaround is to manually enter the endpoint every time. The only alternative is to save the endpoint in a note or text file and copy it in when you turn the toggle on.

I know that’s inconvenient, which is why we’re planning to add the option to set a separate base URL for each custom model. That should fix both issues.

I have this same issue. GLM-4.7 works with my Ultra subscription, but changing the switch is pain in the ass. Cursor should use the GLM-4.7 to replace the lousy Composer1 by the way…

Hi Dean,
Do you have any ETA for these fixes?

Hey, unfortunately I don’t have an exact ETA. The issue is in our backlog, and the team is working on it, but I can’t share a specific date yet.

I get that the workaround is inconvenient. I’ll let you know here once the fix is out.

This is an issue on your side, GLM should work unless you’re on free plan (BYOK doesn’t work on free plan) or unless you pasted the key/endpoint wrongly

As for endopint pasting each time - disable the “OpenAI API key” toggle to use cursor models, not the “Override…” toggle, that way your endpoint stays saved.

Also facing the same issue today. Oddly enough, GLM-4.7 was working just fine until this morning. Then I restarted cursor for an update and now it doesn’t work. I’m on a pro plan. Cursor Version: 2.4.22 (Universal)

Also facing this issue, can’t connect any GLM models to cursor. Checked API, it is working ok

Is there any news on ETA when this bug will be fixed?

Receiving this error message now:

Request ID: 5600d4cf-f01b-4d5b-8861-5eefe3107f76
AI Model Not Found Model name is not valid: “GLM-4.7”
F4t: AI Model Not Found Model name is not valid: “GLM-4.7”
at Gmf (vscode-file://vscode-app/Applications/Cursor.app/Contents/Resources/app/out/vs/workbench/workbench.desktop.main.js:9095:38263)
at Hmf (vscode-file://vscode-app/Applications/Cursor.app/Contents/Resources/app/out/vs/workbench/workbench.desktop.main.js:9095:37251)
at rpf (vscode-file://vscode-app/Applications/Cursor.app/Contents/Resources/app/out/vs/workbench/workbench.desktop.main.js:9096:4395)
at fva.run (vscode-file://vscode-app/Applications/Cursor.app/Contents/Resources/app/out/vs/workbench/workbench.desktop.main.js:9096:8170)
at async Hyt.runAgentLoop (vscode-file://vscode-app/Applications/Cursor.app/Contents/Resources/app/out/vs/workbench/workbench.desktop.main.js:34196:57047)
at async Zpc.streamFromAgentBackend (vscode-file://vscode-app/Applications/Cursor.app/Contents/Resources/app/out/vs/workbench/workbench.desktop.main.js:34245:7695)
at async Zpc.getAgentStreamResponse (vscode-file://vscode-app/Applications/Cursor.app/Contents/Resources/app/out/vs/workbench/workbench.desktop.main.js:34245:8436)
at async FTe.submitChatMaybeAbortCurrent (vscode-file://vscode-app/Applications/Cursor.app/Contents/Resources/app/out/vs/workbench/workbench.desktop.main.js:9170:14575)
at async Ei (vscode-file://vscode-app/Applications/Cursor.app/Contents/Resources/app/out/vs/workbench/workbench.desktop.main.js:32994:3808)

Same problem

Same problem. Using v2.4.27. Unable to use openai/openrouter models

Update: Still broken in v2.4.28

wow, cancelled my cursor sub a month or two ago and came back to try with the z.ai keys.. wow…

I had the same problem at an older version like 2.4.28. Now I upgrade to 2.4.31 and get a new error message “Free plans can only use Auto. Switch to Auto or upgrade plans to continue.“. Is this still a bug or I need to upgrade my plan to use custom models?

Don’t bother, I have subscription and openrouter API key is still broken in Version: 2.4.31

I ended up installing Roo Code extension to use my openrouter key :frowning:. Also started using Antigravity. It’s still on preview so it has higher limits for free plan.

Update: still broken in Version: 2.4.36

1 Like

Adding another data point — same core issue (BYOK custom model broken), plus context on why this is a blocker for high-volume Team Plan users.


My setup

  • Model: GLM-5 (via Z.AI, OpenAI-compatible endpoint)
  • Override OpenAI Base URL: https://api.z.ai/api/coding/paas/v4
  • OS: Windows 11 (WSL2 Ubuntu 22.04)
  • Cursor Version: (fill in from Menu → About Cursor → Copy)
  • Request ID: ec3d329b-1d0e-432a-93f9-c1513facd078

Issue 1: BYOK broken — same as this thread

Every request returns:

Invalid API key. Unauthorized User API key

The API key works fine via direct curl:

curl -s https://api.z.ai/api/coding/paas/v4/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <my_z_ai_api_key>" \
  -d '{"model":"GLM-5","messages":[{"role":"user","content":"ping"}],"max_tokens":10}'
# → HTTP 200, valid response

Dashboard shows User API Key | GLM-5 | 0 tokens | $0.00 — the request hits Cursor’s proxy but auth fails before reaching Z.AI.

Tried: fresh chat, re-adding model, re-entering key, restarting Cursor, multiple API keys. Nothing works.

This also matches #149214 (Custom Model problems) and #152503 (Using GLM-4.7 in Cursor) — nearly two months of reports now.

@deanrie — Is there a fix in progress? Even a rough ETA would help. Without any communicated timeline, it’s impossible to tell whether to wait or migrate away.


Issue 2: Why this is critical — BYOK is the only cost-isolation option on Team Plan, and Cursor Token Fee undermines it

Why I need BYOK in the first place

My monthly usage: ~1.05 billion tokens/month. Peak single prompt: 19.26M tokens ($14.81). This is legitimate senior engineering workload, not abuse.

The Team Plan has no per-user on-demand spending limit (docs say per-member limits are Enterprise-only). Only team-wide caps exist, so a high-volume individual like me risks consuming the team’s entire on-demand budget. The alternative is constantly monitoring the dashboard and self-throttling — which defeats the purpose of an AI coding assistant.

BYOK was my solution: route inference costs to my own provider (Z.AI GLM Coding Plan) so my usage doesn’t impact my team.

BYOK doesn’t solve it — Cursor Token Fee consumes the included credit

Despite routing inference costs externally via BYOK, the Cursor Token Fee ($0.25/M on ALL tokens including BYOK, per Team Pricing docs — “Cursor Token Fee” section) at my volume = ~$263/month for the fee alone.

The inference cost is on Z.AI’s side, yet the Cursor Token Fee alone instantly exhausts the Team Plan’s included credit ($20/user/month) and massively exceeds it. The whole point of BYOK is cost isolation, but the Token Fee negates that.

Cursor Token Fee: stated coverage vs. actual costs

The fee covers three items:

  1. Semantic search (see Cursor docs — “Semantic search” page)
  2. Custom model execution (Tab, Apply, etc.) (see Cursor blog — “instant-apply” post)
  3. Infrastructure

However, examining the actual cost of each reveals a significant gap with the $0.25/MTok rate.

Semantic search:

  • Per the Cursor “semsearch” blog post (Nov 6, 2025) and “secure-codebase-indexing” blog post (Jan 27, 2026), embedding generation happens at indexing time (offline), and unchanged chunks are cached
  • At inference time, the cost is a similarity query against Turbopuffer (VectorDB) — not an embedding model inference. VectorDB query costs are very cheap
  • This feature has been included in subscription since Codebase Context v1 in June 2023 — over 2 years. It was first charged as a Token Fee item in the September 2025 Team Plan pricing change
  • Cursor’s decision to invest in a custom embedding model (semsearch blog: fine-tuned using agent session traces ranked by LLM) was their own business decision. Passing that R&D cost to users as a “processing fee” levied on inference tokens is cost-shifting from a product decision

Tab/Apply:

  • Apply (instant-apply blog) uses a Llama-3-70b-based fine-tuned model with speculative decoding at ~1000 tokens/sec. A single file rewrite consumes at most a few thousand tokens — a few cents per operation
  • Tab uses an even lighter model, generating a few dozen to a few hundred tokens per completion — effectively zero cost
  • I don’t use Tab completion at all — my workflow is Agent prompts only

Infrastructure:

  • Proxy routing, file sync, etc. Understood as largely fixed costs

Summary:

Token Fee item Actual cost at inference time Proportional to inference tokens?
Semantic search VectorDB query (cheap) No (scales with codebase size)
Tab Lightweight inference, dozens of tokens No (scales with completion count)
Apply A few thousand tokens, cents per operation No (scales with file edit count)
Infrastructure Fixed costs No

All three items have low actual costs and none are proportional to inference token volume. Yet $0.25/MTok is levied on ALL inference tokens (input + output + cached). At my volume (~1.05B tokens/month), that’s $263/month — but the actual cost of Semantic search is DB query fees, Apply is a few dollars even at hundreds of executions, and Tab is not used at all.

Questions for the Cursor team

  1. The scope of the Cursor Token Fee is unclear. In #148596 (BYOK subtracting from Cursor plan usage), an individual Pro plan user reported that BYOK usage was being deducted from their Pro plan credit, with the dashboard showing User API Key / GLM-4.7 / Cost: Included. @Colin responded: “This is the Cursor Token Fee, which applies to Team plans.” However, the reporter explicitly stated “my Cursor Pro plan” — this was an individual Pro plan report. Documentation only mentions Token Fee for Team plans. Does this response mean that the Cursor Token Fee also applies to individual Pro plan BYOK? If yes, this needs to be documented. If no, then #148596 is a billing bug that needs to be fixed. @deanrie — would appreciate your confirmation on this point as well.

  2. Could BYOK users opt out of fee components for features they don’t use?

  3. Semantic search has been included in subscription since June 2023. The investment in a custom embedding model was Cursor’s decision. Is it appropriate to recoup that cost by levying a per-inference-token fee on users? The actual costs of VectorDB queries and Apply operations appear significantly lower than the $0.25/MTok rate.

  4. Has the team considered billing the Token Fee only on Cursor-side token consumption (Semantic search, Apply) rather than on BYOK inference tokens, or offering a flat monthly rate?


Summary

Issue Status Impact
BYOK broken Regression since ~Jan 2026, no fix timeline communicated Cannot use BYOK at all
Cursor Token Fee on BYOK ~$263/month at my volume Inference costs routed externally, yet included credit is instantly consumed by the fee alone. Large gap between fee rate and actual costs of covered items
No per-user spend limit Team Plan only, Enterprise required Root cause forcing reliance on BYOK
Token Fee scope unclear Docs and support responses contradict on Pro plan applicability Cannot safely rely on BYOK on any plan

Net result: I cannot use Cursor professionally at the level I need to. A fix timeline for BYOK and a response on the Token Fee structure would be greatly appreciated.


References (thread numbers due to new-user link limit)

BYOK / Custom model issues: #148815 (this thread), #149214, #152503, #147218 (staff-acknowledged), #140266, #132572

Billing: #148596, #140467

Official blogs: “Updates to Teams pricing” (Aug 2025), “Clarifying our pricing” (Jun 2025), “semsearch” (Nov 2025), “secure-codebase-indexing” (Jan 2026), “instant-apply”

Official docs: Team Pricing page (Cursor Token Fee section), Semantic search page

↑By the way—during this investigation I prioritized accuracy, so I used claude-4.6-opus-max-thinking (the top-tier model). If you do a rough conversion based on API unit pricing, even Sonnet (slightly lower-tier) would still have burned about 53% of the included allowance ($10.60 / $20)… latest models are pricey lol.

@imura Just use Roo Code extension if you can.

It’s still broken for me in Version: 2.5.20