Adding another data point — same core issue (BYOK custom model broken), plus context on why this is a blocker for high-volume Team Plan users.
My setup
- Model: GLM-5 (via Z.AI, OpenAI-compatible endpoint)
- Override OpenAI Base URL:
https://api.z.ai/api/coding/paas/v4
- OS: Windows 11 (WSL2 Ubuntu 22.04)
- Cursor Version: (fill in from Menu → About Cursor → Copy)
- Request ID: ec3d329b-1d0e-432a-93f9-c1513facd078
Issue 1: BYOK broken — same as this thread
Every request returns:
Invalid API key. Unauthorized User API key
The API key works fine via direct curl:
curl -s https://api.z.ai/api/coding/paas/v4/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <my_z_ai_api_key>" \
-d '{"model":"GLM-5","messages":[{"role":"user","content":"ping"}],"max_tokens":10}'
# → HTTP 200, valid response
Dashboard shows User API Key | GLM-5 | 0 tokens | $0.00 — the request hits Cursor’s proxy but auth fails before reaching Z.AI.
Tried: fresh chat, re-adding model, re-entering key, restarting Cursor, multiple API keys. Nothing works.
This also matches #149214 (Custom Model problems) and #152503 (Using GLM-4.7 in Cursor) — nearly two months of reports now.
@deanrie — Is there a fix in progress? Even a rough ETA would help. Without any communicated timeline, it’s impossible to tell whether to wait or migrate away.
Issue 2: Why this is critical — BYOK is the only cost-isolation option on Team Plan, and Cursor Token Fee undermines it
Why I need BYOK in the first place
My monthly usage: ~1.05 billion tokens/month. Peak single prompt: 19.26M tokens ($14.81). This is legitimate senior engineering workload, not abuse.
The Team Plan has no per-user on-demand spending limit (docs say per-member limits are Enterprise-only). Only team-wide caps exist, so a high-volume individual like me risks consuming the team’s entire on-demand budget. The alternative is constantly monitoring the dashboard and self-throttling — which defeats the purpose of an AI coding assistant.
BYOK was my solution: route inference costs to my own provider (Z.AI GLM Coding Plan) so my usage doesn’t impact my team.
BYOK doesn’t solve it — Cursor Token Fee consumes the included credit
Despite routing inference costs externally via BYOK, the Cursor Token Fee ($0.25/M on ALL tokens including BYOK, per Team Pricing docs — “Cursor Token Fee” section) at my volume = ~$263/month for the fee alone.
The inference cost is on Z.AI’s side, yet the Cursor Token Fee alone instantly exhausts the Team Plan’s included credit ($20/user/month) and massively exceeds it. The whole point of BYOK is cost isolation, but the Token Fee negates that.
Cursor Token Fee: stated coverage vs. actual costs
The fee covers three items:
- Semantic search (see Cursor docs — “Semantic search” page)
- Custom model execution (Tab, Apply, etc.) (see Cursor blog — “instant-apply” post)
- Infrastructure
However, examining the actual cost of each reveals a significant gap with the $0.25/MTok rate.
Semantic search:
- Per the Cursor “semsearch” blog post (Nov 6, 2025) and “secure-codebase-indexing” blog post (Jan 27, 2026), embedding generation happens at indexing time (offline), and unchanged chunks are cached
- At inference time, the cost is a similarity query against Turbopuffer (VectorDB) — not an embedding model inference. VectorDB query costs are very cheap
- This feature has been included in subscription since Codebase Context v1 in June 2023 — over 2 years. It was first charged as a Token Fee item in the September 2025 Team Plan pricing change
- Cursor’s decision to invest in a custom embedding model (semsearch blog: fine-tuned using agent session traces ranked by LLM) was their own business decision. Passing that R&D cost to users as a “processing fee” levied on inference tokens is cost-shifting from a product decision
Tab/Apply:
- Apply (instant-apply blog) uses a Llama-3-70b-based fine-tuned model with speculative decoding at ~1000 tokens/sec. A single file rewrite consumes at most a few thousand tokens — a few cents per operation
- Tab uses an even lighter model, generating a few dozen to a few hundred tokens per completion — effectively zero cost
- I don’t use Tab completion at all — my workflow is Agent prompts only
Infrastructure:
- Proxy routing, file sync, etc. Understood as largely fixed costs
Summary:
| Token Fee item |
Actual cost at inference time |
Proportional to inference tokens? |
| Semantic search |
VectorDB query (cheap) |
No (scales with codebase size) |
| Tab |
Lightweight inference, dozens of tokens |
No (scales with completion count) |
| Apply |
A few thousand tokens, cents per operation |
No (scales with file edit count) |
| Infrastructure |
Fixed costs |
No |
All three items have low actual costs and none are proportional to inference token volume. Yet $0.25/MTok is levied on ALL inference tokens (input + output + cached). At my volume (~1.05B tokens/month), that’s $263/month — but the actual cost of Semantic search is DB query fees, Apply is a few dollars even at hundreds of executions, and Tab is not used at all.
Questions for the Cursor team
-
The scope of the Cursor Token Fee is unclear. In #148596 (BYOK subtracting from Cursor plan usage), an individual Pro plan user reported that BYOK usage was being deducted from their Pro plan credit, with the dashboard showing User API Key / GLM-4.7 / Cost: Included. @Colin responded: “This is the Cursor Token Fee, which applies to Team plans.” However, the reporter explicitly stated “my Cursor Pro plan” — this was an individual Pro plan report. Documentation only mentions Token Fee for Team plans. Does this response mean that the Cursor Token Fee also applies to individual Pro plan BYOK? If yes, this needs to be documented. If no, then #148596 is a billing bug that needs to be fixed. @deanrie — would appreciate your confirmation on this point as well.
-
Could BYOK users opt out of fee components for features they don’t use?
-
Semantic search has been included in subscription since June 2023. The investment in a custom embedding model was Cursor’s decision. Is it appropriate to recoup that cost by levying a per-inference-token fee on users? The actual costs of VectorDB queries and Apply operations appear significantly lower than the $0.25/MTok rate.
-
Has the team considered billing the Token Fee only on Cursor-side token consumption (Semantic search, Apply) rather than on BYOK inference tokens, or offering a flat monthly rate?
Summary
| Issue |
Status |
Impact |
| BYOK broken |
Regression since ~Jan 2026, no fix timeline communicated |
Cannot use BYOK at all |
| Cursor Token Fee on BYOK |
~$263/month at my volume |
Inference costs routed externally, yet included credit is instantly consumed by the fee alone. Large gap between fee rate and actual costs of covered items |
| No per-user spend limit |
Team Plan only, Enterprise required |
Root cause forcing reliance on BYOK |
| Token Fee scope unclear |
Docs and support responses contradict on Pro plan applicability |
Cannot safely rely on BYOK on any plan |
Net result: I cannot use Cursor professionally at the level I need to. A fix timeline for BYOK and a response on the Token Fee structure would be greatly appreciated.
References (thread numbers due to new-user link limit)
BYOK / Custom model issues: #148815 (this thread), #149214, #152503, #147218 (staff-acknowledged), #140266, #132572
Billing: #148596, #140467
Official blogs: “Updates to Teams pricing” (Aug 2025), “Clarifying our pricing” (Jun 2025), “semsearch” (Nov 2025), “secure-codebase-indexing” (Jan 2026), “instant-apply”
Official docs: Team Pricing page (Cursor Token Fee section), Semantic search page