Codebase Indexing

Hey there, nice editor.
How does the Codebase Indexing feature works, for it to not store any of the code on the servers? And are the embeddings made through the API?
Thanks!

Hey! The codebase indexing feature works by:

  1. Chunking your codebase into small pieces locally
  2. Sending each piece to our server which then embeds the code (either with OpenAI’s embedding API or by a custom embedding model)

The embeddings are stored in a remote vector DB, along with starting / ending line numbers and the relative path to that file. None of your code is stored in our databases. It’s gone after the life of the request.

You can turn off codebase indexing by going into settings (gear in the top right or “Cursor Settings” in the command palette).

Would be helpful to know if you’d prefer we do anything differently here. Happy to answer any questions.

2 Likes

So it’s correct to assume that when Local mode is enabled, codebase indexing will still persist the vectors in a remote DB?

Yep.

If you prefer, you can turn off indexing in Cursor settings (command + shift + p, “cursor settings”). We also give people an option to turn off indexing in our onboarding flow.

I don’t think they can do much with embedding vectors of our codebases.

Using OpenAI’s embedding API of Cursor, not the user’s API … right?

Yep

1 Like

If we are using private OpenAI key, with local mode enabled where is the vector database stored when doing codebase indexing?

If (with both of those settings enabled) it is still being stored on Cursors servers that should be changed to being stored locally. I would love some more information on that.

Additional suggestion for future - a page on the site with a data storage location matrix would be very helpful. For sensitive code we need to understand where its being store versus temporary in transit etc.

Thanks!!

PS. Absolutely loving Cursor. You guys are crushing it!

2 Likes

It would be useful to be able to exclude file from indexing for security reasons, event if not sent to cursor servers and only sent as embeds.

The vector DB will always be remote, though again no code is stored in it (if you turn on local mode, none of your code will be stored at-rest by us).

Here’s how you can turn off indexing.

I like this!

1 Like

Like this idea too. Right now, we have a local heuristic scrubber that blocks any secrets/key from being sent of your computer (both for indexing and chat and command + K). But would be good to allow for more control here.

Where is the option to Turn off indexing completely?
because i’m not seeing it.
when open a project or create a new project it automatically start indexing.
and i have to remove it “after” it got synced.
what if there was a setting to “completely” turn it off and you only use the “+ index” option whenever you needed?

If you upgrade to the latest version of Cursor, there should be a button that says Advanced in the bottom left under indexing. If you click that, it’ll show you a toggle to turn off indexing.

isn’t this the latest update?
image

Ah, apparently this toggle would only show up if you had a Git repo. Should be fixed in 0.8.5 — thank you for letting us know!

Does what you wrote apply to Free plan users, or is it only for Business plan subscribers?

For reference, here is related docs links for Codebase Indexing:

https://docs.cursor.com/context/codebase-indexing