Privacy and Data Anonymization Features in Cursor

Dear All

I have reviewed the Cursor privacy policy, terms of service, and discussions regarding privacy mode and data handling. I appreciate the thoughtful measures in place, such as ensuring no code is stored when Privacy Mode is enabled, the strict third-party data retention limitations, and the deletion of plaintext code after processing during indexing. These efforts clearly reflect a commitment to user data privacy.

However, I have a specific question about anonymization capabilities. Is Cursor capable of anonymizing identifiable terms within both the codebase and the prompts submitted outside the user’s machine? By anonymization, I refer to handling parts of the code, comments or metadata that are not generically named and could contain identifiable information about the code owner, such as:

  • IP addresses,
  • Application names,
  • AWS account IDs,
  • Cluster addresses,
  • Configurations or tokens related to an individual, organization, or location.

A pre-processing feature to map or mask such terms locally on the user’s machine—before they are sent out for model inference—would provide an additional layer of protection. This could involve replacing sensitive terms with non-identifiable placeholders while maintaining functionality for model interactions.

While I recognize and value the privacy practices already in place, I believe this feature could further enhance trust and utility, especially for users in environments where compliance and confidentiality are critical concerns. Could you let me know if such anonymization capabilities currently exist in Cursor or if they are being considered for future development?

Curious about your thoughts
Thank you for your attention, and I look forward to your insights.

Best,
Mehdi

just a reference, there is a useful reference post which I could find that has collected multiple pieces related to data/code privacy in one of its responses thankfully by @litecode: Data Retention in the Business Plan - #2 by msc

1 Like

They probably cannot anonymize the data, but they can use services to reflect the most sensitive categories of identifiers, though even that will not be perfect and a few percentage will not get redacted.

There does not exist a full-proof automatic de-identification system. But it can be a good risk mitigation.

The can use a service like

on the data before passing it to other systems. Azure and AWS have similar services.

There are also some on-prem solutions, like http://private.ai that they can integrate with potentially.