Privacy Policy needs more clarification about how the content is handled and used

From the policy it is not clear the content (code and chat) data is being handled and used. Here are a few questions:

  • It is not clear if and for which data, you are acting as Data Controller and as Data Processor. I would expect that other than explicitly submitted feedback for product improvement, for content data (code and chat), that is not for configuring the service itself, you would be acting as Data Processor. Are you acting as Data Processor for content or are you acting as Data Controller?

    • The files that I have on my machine and that your software might access includes files that contain Personal Data of my users. Since you have no direct relationship with those Data Subjects, I would expect them to simply act as Data Processor here. And that you would not retain them beyond the timeline that is required for processing my requests. Is that correct?

    • For unpaid free version, are you essentially saying that we should not use it where we might have Personal Data or other sensitive and confidential data? How about paid non-enterprise version?

  • The legal justification for why you are allowed to handle the data is unclear. You are mixing Legitimate Interest justification with Consent. What data are you claiming Legitimate Interest justification for and for what purposes exactly? And it is not clear where you are collecting Consent from data subjects and for which data and for what purpose.

  • retention and usage: It seems that when privacy mode is not enabled the data will be retained and can be used to improve product. Is that correct?

  • Is it correct that if privacy mode is enabled, at least in the paid version, the data will not be store beyond the minimum time required for handling the request on your side and that you would not use them to train models, either directly or through indirect methods like RLHF and reward models?

  • Are there any service logs on your side where you would dump content from the request/response in some form for troubleshooting? If yes, what are they and how long do you keep them?

  • How can I control what files does Cursor access on my machine? Is it only going to access files that are opened? Only files that are in the same git repo that is opened? Only files in the opened folder and its subfolders? It is not clear what data Cursor is accessing and potentially sending to your servers.

  • Is the repo indexing done on your server side or only locally on my machine? Are you essentially keeping an index of our repo on your servers? If so, how long do you keep them on your side? Is there a TTL on the storage? What controls do we have over those indexes? How can we see what you have and request deletion?

  • Is there a way for us to request to a copy of previously stored content from our account and ask for their deletion? E.g. if we by mistake forget to enable the Privacy Mode, and as a result end up sending sensitive data in non-Privacy Mode, is there a way to get them deleted? How long do we have to request deletion in such a case before you might use it for training models and get the data to other places that might be hard to delete? If that happens, how long would it take before it disappears from your models and storage systems from the time we request deletion?

  • Is your usage of services from OpenAI and Anthropic framed under a pure sub-processor agreement that they will not use the content data for any purpose other than serving the request?

3 Likes

One more question:

  • when using server side embeddings, are you ensuring that our embeddings are isolated from the embeddings of other users?
  • embeddings essentially contain a lot of information about our content, and can be used to reconstruct the content or parts of it from which they are computed. So we would expect them to be treated like the content from which they are computed from privacy perspective.
  • is there a way to make Cursor compute and store the embeddings locally and on-prem and not need to send them to your servers?