Litellm Proxy Cost per request tracker

Hello,

Suppose we have an llm proxy that is running in an enterprise. What can I do to know the cost of each request that hits the llm and correlate it somehow with a request id coming from my app?

Do we have an exact cost per request?
If not can we estimate the cost? What s the accuracy of estimating based on completion and input tokens?

Hey, to answer more accurately, can you clarify: are you using Cursor via BYOK with a custom base URL pointing to your LiteLLM proxy, or is this a separate setup with no Cursor in the chain?

On the main question (this is more about LiteLLM than Cursor):

  • Exact cost per request: LiteLLM calculates the cost of each call using its model pricing table and the provider usage from the response. You can access it in callbacks as response_cost and in the x-litellm-response-cost header. For OpenAI, Anthropic, and Gemini this is basically the exact number since the provider returns real token counts, not an estimate.
  • Streaming nuance: the x-litellm-response-cost header is not returned in streaming mode because headers are sent before the stream finishes and before the cost is known. See open issue #12689 https://github.com/BerriAI/litellm/issues/12689. For streaming, get the cost from the final chunk with usage using stream_options: {include_usage: true}, or from the success_callback or SpendLogs after the request completes.
  • Token-based estimate before the response: you can use litellm.token_counter and litellm.completion_cost based on input plus estimated output, but for billing you should rely on the actual value after the response.
  • Correlating with your app request id: pass your id in metadata since the OpenAI-compatible request body supports custom fields via LiteLLM, or pass it via a custom header like x-trace-id. LiteLLM logs this into its DB and any connected callbacks like Langfuse, Prometheus, or a custom logger. Then you join by that id on your side.
  • Storage: run LiteLLM Proxy with a Postgres backend. The LiteLLM_SpendLogs table stores per-request cost, model, tokens, metadata, and request id.

From the Cursor side, per-request cost and request id are not exposed. The Admin API only returns aggregated usage by model and date, so cost tracking makes the most sense at the LiteLLM layer.

If you still want a Cursor-specific angle, tell me exactly how Cursor is in the chain and I’ll point out what’s available.