Different token types

Curious on what each token type is?
Also curious if we know anything about the cost of each type and while not knowing the limits, if certain tokens use up that limit further and by how much?

As far as my understanding goes:
Input: Prompt
Output: The.. output on the llm
Write Cache: Writing to a local (?) cache
Read Cache: Reading from a cache instead of thinking about something previously discussed

I’m curious about this as my limits are hit so quick, and looking at my Usage majority of the tokens I’m using are Read Cache, but I don’t think that should really effect the rate limits as much the other 3?

A post was split to a new topic: Can AI Pro connect to DBeaver?

@Reznal

  • Input tokens: request, attached files, rules, docs, read files, MCP output …
  • Output tokens: Chat text, Code, data sent to MCP from AI,…
  • Write Cache: previous chat messages/context to be saved for the session
  • Read cache: reading cache without having to fetch data from users computer, basically temporary session data

Read cache is the cheapest of all, so thats not an issue

Thats for the details :slight_smile:
Getting an api cost for each in each model would be amazing.
Although most of my prompts that have any size majority of the tokens are Read Cache and I still get rate limited nearly instantly.

1 Like

Yeah, there is a feature request for this and I forwarded it to Cursor Team.

1 Like

Okay its fine that Read Cache is the cheapest of all, but if it goes bezerk on the amont it might still be a issue. Have a look at this log

token-based usage calls to claude-4-sonnet, totalling: $2.62. Input tokens: 1120, Output tokens: 10751, Cache write tokens: 117388, Cache read tokens: 5252615

For 1120 input tokens and 10751 output tokes. over 100k cache write and +5million cache read tokens ??? I wonder if this explains why I as a casual hobby coder hit the new rate limit on all models in a matter of minutes.

2 Likes

@gram12321 is this from a long chat?