Codebase indexing VS chat with codebase

daaniyaan · August 31, 2023, 4:08pm

So i’ve got a question.
1- you can already Chat with codebase without index being Enable in the settings. Is that right?
2- what’s the advantage of Enabling it? a better answer? How much is the difference? is it worth it?
3- are they working differently? (indexing codebase vectorize and embedd the code while chat with codebase just search through the file in the workspace)
ASking this question because if we can still have a good chat with the codebase and entire workspace and the quality is good then it might not worth it to use the codebasing index option in the setting.
so yeah i would like to know what’s the difference and usecase for these features.

truell20 · August 31, 2023, 6:21pm

Yep! You can chat with codebase whether or not you index the codebase.

If you don’t use indexing, we fall back on a simpler, entirely-local, and worse method for figuring out what parts of the codebase to show GPT-4 to answer your codebase-wide question.

daaniyaan · August 31, 2023, 6:31pm

What about local embedding database tools/methods?
there are many methods out there.
what you guys think about that approach?
so even the vectorizing and embedding can happen locally.

truell20 · August 31, 2023, 6:39pm

We worry that entirely local embeddings would be:

quite a resource hog (both from the model inference and from the vector store, especially for folks who are on older PCs)
limit the quality of the vector embeddings we could ship, by limiting the size of our embeddings model

In general, with Cursor, our philosophy is to focus our limited engineering bandwidth on pushing the AI as far as possible, which does mean reducing the resources spent on things like an entirely local experience.

AbdSab · September 1, 2023, 4:17am

Hello, new user here. I am in the phase of comparing this new AI first IDE -Cursor- with other options. How does Cursor actually index my codebase?

amanrs · September 1, 2023, 9:24pm

We split it into syntactically relevant chunks (using tree-sitter), then store the embeddings in our vector database, while never storing any of your code on our servers.

We use the local state of your codebase as the source of truth for the text corresponding to a given vector in the database.

nyck3333 · February 8, 2024, 10:16am

You do this for the entire code base when I open a project with Cursor and enable indexing? Then when I close the folder or window for Cursor, that vector database is deleted? That seems really expensive to let users use at cost if they provide their openai api key. Or do those users not get the full experience of having their entire codebase indexed/vectorized into vector databases? Do you have any plans to let users see the vector databases and then let them choose a sharding? scheme or to manually adjust parameters for better indexing?

SemperFidelis0510 · April 17, 2025, 8:55am

Hey! Thanks for the insight.

Quick question:
If I’m working with a custom language and I have a Tree-sitter grammar already set up (highlighting works fine in Cursor), would improving the semantic structure of that grammar (like more accurate function_definition, block, etc.) actually help Cursor index the code more effectively?

Also, when you mentioned “using tree-sitter” — were you referring to a specific Tree-sitter parser that Cursor uses internally? Or just whatever grammar is available in the environment?

Would love to understand how much control I have here. Thanks!

Topic		Replies	Views
Chat with codebase vs Chat difference Bug Reports	4	1516	June 3, 2025
Chat vs Chat with codebase Discussions	4	1441	August 23, 2023
How to index code base? Discussions	3	5126	December 10, 2024
Understanding of Add context in chat window & codebase indexing Discussions	3	1009	November 4, 2024
Codebase Indexing Discussions	16	15656	September 7, 2024

Codebase indexing VS chat with codebase

Related topics