I recently referenced Cursor’s current tl;dr privacy policy in this post:
And I often wonder what the relationship is between these settings and actions:
01) Privacy Mode Enabled (which I toggle ON)
02) Codebase Indexing (which I also toggle ON)
Is item 01 negated/overridden if I enable item 02?
My, possibly incorrect, understanding after writing this post is that:
Item 02does not negate item 01
Item 02does involve storing the vector embeddings of your code in a vector database but not the code itself
I’m still not sure why the vector embeddings aren’t considered as important as the code - aren’t they just multidimensional numerical representations of the code text, and therefore could be ‘un-embedded’ by using the same embedding model that was used to embed them?
I’ll leave it to someone with more knowledge than me to provide an authoritative answer .
In regard to your question about enforcing Privacy Mode, I am a Pro user and I can toggle that setting on or off, so I am assuming the Business plan enables admins to enforce this setting for all their users.
Links for additional reference:
For better and more accurate codebase answers…you can index your codebase. Behind the scenes, Cursor computes embeddings for each file in your codebase, and will use these to improve the accuracy of your codebase answers.
It does not! If you choose to index your codebase, Cursor will upload your codebase in small chunks to our server to compute embeddings, but all plaintext code ceases to exist after the life of the request.
The embeddings and metadata about your codebase (hashes, file names) are stored in our database, but none of your code is.
With Privacy Mode, none of your code will ever be stored by us or any third-party (except for OpenAI which persists the prompts we send to them for 30 days for trust and safety, unless you’re on the business plan). Otherwise, we may save prompts / collect telemetry data to improve Cursor.
Ah, OK. So “enforcing privacy mode” is a central admin feature to enforce it for all users, got it.
Thank you for your comprehensive analysis, I get quite a good picture from this.
I think this is a sufficient level of security for me. These guys are a startup, so I’m a bit afraid that they are not so experienced in terms of IT security of their systems.
But I’m not too much concerned that someone is putting in the time and effort to reconstruct our code from the embeddings, even if there is a breach and someone is getting their hands on the data.
That would be a different story if our code would be stored in plain text somewhere on their servers.