Whenever I resume a chat that I left overnight, my account is hit with 10+ request usage event which is like 10x of an average usage event. The size is similar to the first request of a chat with a lot of initial data. It seems Cursor is re-seeding the chat or something then to continue it.
5 minutes with Anthropic, or an hour but there’s no way Cursor is paying the for that.
By default, the cache has a 5-minute lifetime. The cache is refreshed for no additional cost each time the cached content is used.
5-minute cache write tokens are 1.25 times the base input tokens price
1-hour cache write tokens are 2 times the base input tokens price
Cache read tokens are 0.1 times the base input tokens price
Regular input and output tokens are priced at standard rates Prompt caching - Anthropic
If I’m reading it correctly the time is refreshed when it’s used so it might be possible to use tricks to keep it alive, but I doubt Anthropic would like that.
AH yes good point with prompt caching. Its used when you continue working with same chat to avoid all communication to consume tokens again. Though I dont know the caching time.
I don’t know how you call it internally - from my perspective: whenever I come back to a chat after “a while” (I haven’t determined how long I have to wait yet), I’m hit with large usage even though my request after coming back was, like, “continue”.
While I do not have insight into the precise settings, it’s likely that the chat cache (which depends on AI provider) has timed out and therefore the submission of the thread is counted by provider as writing to cache since its empty.