Curious on what each token type is?
Also curious if we know anything about the cost of each type and while not knowing the limits, if certain tokens use up that limit further and by how much?
As far as my understanding goes:
Input: Prompt
Output: The.. output on the llm
Write Cache: Writing to a local (?) cache
Read Cache: Reading from a cache instead of thinking about something previously discussed
I’m curious about this as my limits are hit so quick, and looking at my Usage majority of the tokens I’m using are Read Cache, but I don’t think that should really effect the rate limits as much the other 3?
Thats for the details
Getting an api cost for each in each model would be amazing.
Although most of my prompts that have any size majority of the tokens are Read Cache and I still get rate limited nearly instantly.
For 1120 input tokens and 10751 output tokes. over 100k cache write and +5million cache read tokens ??? I wonder if this explains why I as a casual hobby coder hit the new rate limit on all models in a matter of minutes.