Cross-sectional analysis of multiple Repo

Hi

If I want to analyze across multiple Repo’s, can I create a workspace and display multiple folders so that Codebase can analyze multiple files?

If anyone has any good ideas, please let me know.

*However, since Codebase’s reference area is limited, we think we need to use it in combination with Langchain or something similar.
*By the way, I am a non-programmer.

3 Likes

If you use the “Add Folder to Workspace…” option to add a second folder to your workspace, the Codebase context features still only use the codebase of the first folder in your workspace. But with the “@” symbol, you can still reference files from both folders. This seems to be a bug because the advanced codebase context settings provide the option to use “all” or a specific folder. We’ll investigate.

However, if you put both folders into one folder and open that one parent folder in Cursor, the codebase context features can use both folders simultaneously. So I would recommend that you do that.

2 Likes

・Thanks for your suggestion well said as you suggested.
・On the other hand, what I’m wondering is whether GPT holds that much memory space. While it can take in large amounts of information, can it actually process all of that information?
・Does Cursor introduce its own storage space, etc. in Langchain compared to, for example, the web version of GPT?

The AI itself can only remember a limited amount of text at once. However, what codebase indexing does is make your codebase highly searchable in a smart manner. When you ask a question, it retrieves the most relevant snippets from your codebase related to your question and presents them within the limited context of the AI. This way, the AI only sees what it needs to answer your question, and its memory space doesn’t need to be excessively large.

・Okay, so if I still want to implement Langchain, can I combine Cursor and Langchain? Probably not…
・On a related note, you can’t read a large number of files like this like all Codebase can with the web version of GPT, can Codebase increase the accuracy of GPT compared to regular GPT? Or is it the same?

  1. I’m not sure if I understand your question correctly. LangChain is a framework for developing applications powered by language models. It has helpful methods that you can use when developing your own AI application to make the AI context-aware, for example. Cursor is its own application that has its own logic to make the AI context-aware.

  2. The ChatGPT website can’t read a large number of files because it’s limited by its own context size. It doesn’t have indexing logic built into it like Cursor, which makes it able to retrieve the most relevant snippets, as I described in my previous answer. Yes, it will greatly improve the accuracy of the AI answers when you ask something about your codebase. With ChatGPT, the relevant parts of your codebase would likely be excluded from the limited context, and it wouldn’t even know what code you wrote.

・Thanks for the great answer, github copilot doesn’t have an indexing feature either, right?

Correct. As far as I know, Copilot only sees the code that’s actually visible in your Visual Studio Code window, and that can’t be a lot of code, so they don’t need indexing.

意思是cursor的gpt4无法理解多文件夹内容,也无法处理

@boqihua123 Please read my reply here. Cursor can handle multiple folders.

@Jakob is that still the case in february 2025, and Cursor still having issues with “Add Folder to Workspace…” and having access to all context?

Hey, Cursor does not yet support workspaces made up of files or folders in different directories on your system.

As Jakob’s original answer stated, while you can @ files when using Workspace projects, codebase indexing will not index files from more than one source folder correctly yet.

We are hoping to add this functionality soon!

2 Likes

related: Indexing only reads first folder in the workspace

@danperks when will this be available? This is one of the key features needed for me to keep using cursor in the long term. Who else doesn’t use more than one folder? Surprising this isn’t supported if one folder already works. I’d assume a simple os.walk recurse over what you guys already have should work. Why does that not work?

Hi @danperks , thanks for your reply. To avoid double posting, do you know why my solution of creating a “father” folder with sym links doesn’t seem to work as expected, full reference here: Indexing only reads first folder in the workspace - #28 by brando90 (feel free to respond over there)

What OS are you on?

If you’re on mac or linux you can use symbolic links to make links to both repos inside of some main directory, then open cursor in main. For instance say you have two directories ~/path/to/one/ and ~/path/to/two/ you can do the following:

mkdir main
ln -s ~/path/to/one ~/main/one
ln -s ~/path/to/two ~/main/two

now when you open Cursor in ~/main you’ll have access to both repos

We are working on this as we speak. I’d hope that it will be added in v0.49!

2 Likes

Amazing! If possible, may I ask when the expected timeline for delivery is? :slight_smile:

I’m also genuinely curious why this is a challenge, since recursion is usually simple to implement. But I understand if this is a private proprietary detail that can’t be shared but interested!

Thank you!

The difficulty comes from indexing a codebase that doesn’t physically exist in the same directory on your machine. A lot of the relationship between files is dictated from where they are located relative to each other.

With workspaces, this structure is defined in a kind of “virtual filesystem” that is not actually in the same directory, so we have to do the work to support such a setup for our existing indexing functionality.

Unfortunately, I don’t currently have a timeframe on when v0.49 will be available

2 Likes

I noticed 0.49 is out and the change notes mention “better folder parsing” - can you confirm that working with a workspace with various folders now works? @danperks