How big of a code base can cursor work with?

Hi!

Just curious what’s the biggest codebase you’ve had cursor successfully work with? Thx!

Depends on the model you’re using. Claude 3.5 Sonnet has the biggest input context window. Albeit it tends to give more concise responses, especially with larger input context (from experience). Summarised for you below:

Model Input Context Window Maximum Output Tokens
o1-mini 128K tokens 65.5K tokens
o1 128K tokens 65.5K tokens
Claude 3.5 Sonnet 200K tokens 8,192 tokens
GPT-4o 128K tokens 16.4K tokens

[ 1,000 tokens ≈ 750 words , though this depends on the specific text]

There is no hard limit on how big a codebase can be for Cursor to be effective. This is really down to how you use it to some extent.

Small codebases mean there isn’t much context the AI has to choose from, so it usually knows about the whole of your project when you are talking to it.

Much larger code bases obviously have a lot more files to choose from and these AIs only have a certain amount of context they can see at once. Manually adding context using the @ symbol is the best way to ensure the AI sees what you are talking about.

But we are working on improving the agent mode in Composer which should allow the AI to find the relevant context itself.

1 Like

I have the same question. My code base is in multiple languages (GO on the Back end with Typescript, Javascript on the front end using the Svelte framework) Currently the application has over 12,000 files with about 1 million lines of UI code and about 3 million of back end code. Question is how well can cursor handle understanding context in something this large (in tokens its over 120 million tokens of code)

Additionally to what Dan said, it comes down to:

- Structure of the codebase:
    - Big files means that there is too much code in one file and should have been broken up. (not just for AI but also for humans)
    - Too many files in a folder (over 100? or 1000?) makes indexing harder and difficult to see what is related to what or for AI to select. Dan said well that using @ would allow you to select a file but you would need to know which and why.
    - Approach: overengineered solutions, inventions that just create complexity without creating a distinct benefit, tend to confuse AI more than focused ones. This applies to code, architecture but also to formatting, code style or comments.
- Documentation or requirements available: makes it easier for AI to focus on specifics rather than trying to fill in the blanks by generalizing.
- Tasks and rules: again a specific focus on what changes you want to implement and what rules AI has to follow. 

As with everything there is sometimes a too much, or too little. The same really applies for lets say a new developer who is joining the team. Without knowing details the new dev will have a hard time not just finding the files but also doing any task.

Overall its more likely that lack of understanding how LLMs work and how to prompt properly including the dev workflow causes bad experience than just a ‘large’ codebase.

We are seeing that new apps actually is starting to look different from the past because AI can apply latest best practices, generate less unnecessary code, avoids ‘not invented here’ and focuses on implementation of features instead.