Configurable Rate Limiting for Agents with User-Provided API Keys

Problem Statement

When using a user-provided model API key with Cursor’s Agent, users frequently encounter “User API Key Rate limit exceeded” errors, particularly with high-volume tasks in Agent mode. These errors occur when the Agent submits a large number of requests within a short period, thereby overwhelming the API provider’s rate limits. Currently, Cursor does not enforce rate limits on user-provided keys, relaying the provider’s error without mitigation, which halts the Agent’s work and risks losing progress. This disrupts workflows, particularly for power users who rely on premium models.

Proposed Feature

Introduce a configurable rate-limiting system for Cursor’s Agent when using user-provided API keys, with options for automatic detection of model-specific limits or manual configuration. Additionally, implement robust error handling to retry requests and recover from rate-limit errors without losing work.

Key Components

  1. Configurable Rate Limiting:
    • Automatic Detection: Cursor detects the API provider’s rate limits (e.g., requests per minute [RPM], input tokens per minute [ITPM], output tokens per minute [OTPM]) for the selected model by querying the provider’s API or referencing documentation (e.g., Anthropic’s rate limit headers). Limits are applied to throttle Agent requests.
    • Manual Configuration: Users can specify custom limits per model in Cursor’s settings (e.g., max RPM, ITPM, OTPM) to align with their API plan or preferences.
    • Delay Mechanism: Introduce a configurable delay between requests (e.g., 100ms–1s) to spread out Agent activity, preventing bursts that trigger rate limits.
  2. Enhanced Error Handling and Recovery:
    • Retry with Exponential Backoff: When a 429 error occurs, Cursor retries the request using exponential backoff (e.g., wait 2^attempt seconds) to reduce server load. Respect the API’s “Retry-After” header if provided.
    • Progress Preservation: Cache intermediate Agent outputs locally to prevent loss of work during rate limit interruptions. Resume from the last successful request after the retry succeeds.
    • User Notifications: Display a non-disruptive notification (e.g., in the editor or status bar) when nearing rate limits, with options to pause, switch models, or enable usage-based pricing.
  3. Settings Integration:
    • Add a “Rate Limiting” section in Cursor’s API key settings, allowing users to toggle automatic detection, set manual limits, or adjust retry behaviour.
    • Provide model-specific presets (e.g., Claude 3.5 Sonnet: 500 RPM, 50,000 ITPM) for common providers to simplify setup.

Benefits

  • Prevent Disruptions: Rate limiting reduces 429 errors, ensuring smoother Agent operation for complex tasks like multi-file edits or composer workflows.
  • User Control: Manual and automatic options cater to both novice and advanced users, aligning with diverse API plans (e.g., free tiers and Pro accounts).
  • Work Preservation: Enhanced retry and recovery mechanisms prevent loss of progress, improving reliability for long-running tasks.
  • Cost Efficiency: By avoiding excessive requests, users stay within their API quotas, reducing unexpected costs or the need for usage-based pricing.
  • Centralised Model Use Monitoring and Billing: Users can monitor their use and billing of model subscriptions across multiple services in the model’s console, rather than being “hidden” in Cursor.

Implementation Considerations

  • API Provider Compatibility: Ensure compatibility with major providers (OpenAI, Anthropic, Google) by parsing rate limit headers or maintaining a database of known limits.
  • Performance Trade-Offs: Balance request delays with Agent responsiveness to avoid sluggish performance. Allow users to adjust delay sensitivity.
  • UI/UX: Seamlessly integrate settings into the existing API key dashboard, with clear tooltips explaining the difference between automatic and manual modes.
1 Like