Data Retention in the Business Plan

I’m thinking about using your business plan but I would like to clarify the data retention practice.

You say that in privacy mode you won’t retain any data from my repository for longer than the request duration and OpenAI/Anthrpoic will do the same.

Is that true also when I let you index all of my code?

And what does “enforcing” of the privacy mode mean? Is the privacy mode also available in other plans?

Thank you for your info

3 Likes

I’m interested in this as well for the Enterprise package.

What is the Enterprise package? There are only Free, Pro and Business on their Pricing page

I meant Business plan.

OK, thx for the info

I recently referenced Cursor’s current tl;dr privacy policy in this post:

And I often wonder what the relationship is between these settings and actions:

01) Privacy Mode Enabled (which I toggle ON)

02) Codebase Indexing (which I also toggle ON)

Is item 01 negated/overridden if I enable item 02?

My, possibly incorrect, understanding after writing this post is that:

  • Item 02 does not negate item 01

  • Item 02 does involve storing the vector embeddings of your code in a vector database but not the code itself

I’m still not sure why the vector embeddings aren’t considered as important as the code - aren’t they just multidimensional numerical representations of the code text, and therefore could be ‘un-embedded’ by using the same embedding model that was used to embed them?

I googled this:

Can vector embeddings be converted back?

And it led to things like:

  • One OpenAI user’s guess about how embeddings work

  • Another OpenAI user’s reference to a paper that seemingly shows that vector embeddings can be inverted

  • Microsoft’s article on What are Vector Embeddings?

I’ll leave it to someone with more knowledge than me to provide an authoritative answer :slightly_smiling_face:.

In regard to your question about enforcing Privacy Mode, I am a Pro user and I can toggle that setting on or off, so I am assuming the Business plan enables admins to enforce this setting for all their users.

Links for additional reference:

For better and more accurate codebase answers…you can index your codebase. Behind the scenes, Cursor computes embeddings for each file in your codebase, and will use these to improve the accuracy of your codebase answers.

https://docs.cursor.com/context/codebase-indexing

Does indexing the codebase require storing code?

It does not! If you choose to index your codebase, Cursor will upload your codebase in small chunks to our server to compute embeddings, but all plaintext code ceases to exist after the life of the request.

The embeddings and metadata about your codebase (hashes, file names) are stored in our database, but none of your code is.

https://docs.cursor.com/miscellaneous/privacy#does-indexing-the-codebase-require-storing-code

With Privacy Mode, none of your code will ever be stored by us or any third-party (except for OpenAI which persists the prompts we send to them for 30 days for trust and safety, unless you’re on the business plan). Otherwise, we may save prompts / collect telemetry data to improve Cursor.

https://docs.cursor.com/miscellaneous/privacy#what-is-privacy-mode

Posts for additional reference:

1 Like

Thank you! I’ll be in touch about the enterprise.

1 Like

Ah, OK. So “enforcing privacy mode” is a central admin feature to enforce it for all users, got it.

Thank you for your comprehensive analysis, I get quite a good picture from this.

I think this is a sufficient level of security for me. These guys are a startup, so I’m a bit afraid that they are not so experienced in terms of IT security of their systems.

But I’m not too much concerned that someone is putting in the time and effort to reconstruct our code from the embeddings, even if there is a breach and someone is getting their hands on the data.

That would be a different story if our code would be stored in plain text somewhere on their servers.

Thanks a lot

1 Like