Devs and Cursor team: Is PRO plan ok now?

Hey, thanks for the queries here.

While we’ve not yet posted the structure of our rate limiting system, I can explain how the models work!

The impact a request has on your rate limits is almost exactly corrolated to how much it would cost via an LLM provider’s API:

  • How long the specific message is
  • How long the past conversation is
  • How many files are attached
  • How much does the specific model cost (thinking = higher cost)
  • Is MAX mode (longer context cap) enabled?

While this doesn’t match or use our old pricing system, that can be a good indicator of the price of a model!

Claude 4 Sonnet Thinking is quite an expensive model, and has a decent context window, which can make it a higher-cost option - previously this used 2 requests in the old plan.

o3 is a slower but still quite intelligent model, but has a lower usage cost - previously 1 request - so will have a lower impact on your usage.

GPT-4.1 and Gemini Flash are both totally unlimited too, and don’t touch your rate limit at all.

Finally, I’d highly recommend people use Auto, as this always provides a “premium” model (Claude 4, Gemini 2.5 Pro, etc) but at a significantly reduced effect on your rate limits - this is because we can intelligently route to models with lower usage and utilisation!

5 Likes