How big of a code base can cursor work with?

Eroxx · January 5, 2025, 11:51pm

Hi!

Just curious what’s the biggest codebase you’ve had cursor successfully work with? Thx!

jake · January 6, 2025, 3:23am

Depends on the model you’re using. Claude 3.5 Sonnet has the biggest input context window. Albeit it tends to give more concise responses, especially with larger input context (from experience). Summarised for you below:

Model	Input Context Window	Maximum Output Tokens
o1-mini	128K tokens	65.5K tokens
o1	128K tokens	65.5K tokens
Claude 3.5 Sonnet	200K tokens	8,192 tokens
GPT-4o	128K tokens	16.4K tokens

[ 1,000 tokens ≈ 750 words , though this depends on the specific text]

danperks · January 7, 2025, 5:25pm

There is no hard limit on how big a codebase can be for Cursor to be effective. This is really down to how you use it to some extent.

Small codebases mean there isn’t much context the AI has to choose from, so it usually knows about the whole of your project when you are talking to it.

Much larger code bases obviously have a lot more files to choose from and these AIs only have a certain amount of context they can see at once. Manually adding context using the @ symbol is the best way to ensure the AI sees what you are talking about.

But we are working on improving the agent mode in Composer which should allow the AI to find the relevant context itself.

PaulMichaud · February 18, 2025, 6:42pm

I have the same question. My code base is in multiple languages (GO on the Back end with Typescript, Javascript on the front end using the Svelte framework) Currently the application has over 12,000 files with about 1 million lines of UI code and about 3 million of back end code. Question is how well can cursor handle understanding context in something this large (in tokens its over 120 million tokens of code)

T1000 · February 18, 2025, 7:28pm

Additionally to what Dan said, it comes down to:

- Structure of the codebase:
    - Big files means that there is too much code in one file and should have been broken up. (not just for AI but also for humans)
    - Too many files in a folder (over 100? or 1000?) makes indexing harder and difficult to see what is related to what or for AI to select. Dan said well that using @ would allow you to select a file but you would need to know which and why.
    - Approach: overengineered solutions, inventions that just create complexity without creating a distinct benefit, tend to confuse AI more than focused ones. This applies to code, architecture but also to formatting, code style or comments.
- Documentation or requirements available: makes it easier for AI to focus on specifics rather than trying to fill in the blanks by generalizing.
- Tasks and rules: again a specific focus on what changes you want to implement and what rules AI has to follow.

As with everything there is sometimes a too much, or too little. The same really applies for lets say a new developer who is joining the team. Without knowing details the new dev will have a hard time not just finding the files but also doing any task.

Overall its more likely that lack of understanding how LLMs work and how to prompt properly including the dev workflow causes bad experience than just a ‘large’ codebase.

We are seeing that new apps actually is starting to look different from the past because AI can apply latest best practices, generate less unnecessary code, avoids ‘not invented here’ and focuses on implementation of features instead.

Topic		Replies	Views
Context and large codebases Discussion	5	580	February 26, 2025
Context in Cursor Discussion	5	3696	October 11, 2024
Editing large files with many string Feedback	3	202	January 10, 2025
Guidelines for getting entire files into context? How To	4	138	February 22, 2025
The "Whole 200k Context Window" of Claude 3.7 Sonnet Max Feedback	26	7007	March 30, 2025

How big of a code base can cursor work with?

Related topics