Cursor high token usage

Hey, good request. No, Cursor doesn’t send your whole project. But in Agent mode, every tool call and every follow-up message is a separate API call, and each one has to include the full chat context history, the system prompt, tool call results, and so on. When you’re working with a big open source project, that context grows fast.

About cache write and cache read, that’s prompt caching from the provider (Anthropic or Google), and it actually saves money, not increases costs. Cache read is about 10x cheaper than normal input tokens. Without caching, those same tokens would be counted as full input, and the cost would be much higher.

A detailed explanation from our team with example calculations is here: Someone please explain - Why are cache read and write chargeable? - #8 by condor

A few tips to reduce token usage:

  • Start a new chat for each new task, long chats build up context
  • Only attach the files you actually need in context
  • For simple tasks, use cheaper models
  • If you’re in Agent mode, keep an eye on tool calls, each one resends the full context

Which model and which mode are you using, Agent or Ask? That’ll help me give more specific advice.