An Idiot's Guide To Bigger Projects

:warning: Warning: Mammoth post ahead :warning:

Estimated reading time: ~6 mins, or around 0.000014% of your life.

If you’ve been using Cursor for a while, and started to get into more complex projects with it, you’ll almost certainly have come across the challenges of keeping things on track.

It’s not something specific to Cursor of course, it tends to be the nature of AI-assisted coding in general. Small context easy, big context hard. The models have come a long way, but in general humans still tend to do better at knowing what actually belongs in the bigger picture.

Ok, but what is this post? Why would I read it?

I’ve been using Cursor (and other LLM tools) for over a year, so I figured I’d share some of the tricks that have helped me to handle some larger projects. YMMV, but hopefully this might help some folks to squeeze some extra juice out of the AI-assisted orange. :tangerine:

1. Use Composer

Chat’s great for those little quick questions and asides, but (at the time of writing) it doesn’t checkpoint like Composer does. In case you haven’t spotted it, Composer checkpoints your conversation. If things are going sideways, or you’re spending too much time on bugfixes, you can easily track back to an earlier known-good state before the model started mangling your code. :grin:

2. Git, git, git

Git. Not everyone loves git. Some parts of it are hairy-scary. Thankfully most of it is abstracted away by VS Code and so the scary bits you won’t have to touch. But if you’re not using it, and I mean frequently, you’ll eventually hit a moment where you discover a mis-applied diff obliterated one of your files a while ago, and it’s too late to turn back the clock. :clock1030:

Storage is cheap, so commit little and often. Don’t fall into the trap of waiting until your project is perfect before you commit, because it almost never is. Git commits are not milestones worthy of a fanfare, they’re handy little checkpoints. Learn how to turn back time, and it’ll save you hours of heartache. :mending_heart:

3. Introductory prompts

If you want the model to love decoupling and encapsulation, tell it that it does. If you want it to be an expert in the tech you’re using, tell it that it is (I’ll explain down at the very bottom why). I usually start every Composer session with some boilerplate that encourages good coding practices – you can even find some of these shared online – and if it starts to drift away from that as the context blurs, I start a new session.

3b. Notepads

If you don’t have a notepad describing your goals, and have it permanently stapled to your Composer context, you’re seriously missing out. This one should be considered pretty much essential for any project worth more than a handful of queries, as it’s easily the best way to keep the model on track and well-“motivated”.

4. Shorter Composer sessions

The temptation (a bit like putting off your git commits) is to wait for something to be “just so” before you move on to the next topic and session. But if wayward issues and bugfixes derail your coding, you can end up with very long Composer sessions.

From my experience, not only do these make the UI very sluggish, they also degrade the quality of the output. My most joyous experiences with Composer tend to be near the top of a session, and the hair-pulling parts tend to be when the session is getting too long – that’s when the hallucinations and the circular changes happen (“Ah yes, A will not work, you must B [applies]”; “Ah yes, B will not work, you must A”; repeat). You can also spot this when the model starts spontaneously answering an earlier question instead of the one you just asked… :person_facepalming:

Get used to going again in a fresh session: re-paste your enthusiastic boilerplate, explain where you got up to and what’s broken, and continue with better, cleaner results. :white_check_mark:

5. Demand a plan

My most successful sessions tend to follow a standard format:

  • Throw in the boilerplate prompt about how great the AI is at the chosen languages and libraries
  • Explain the high level view of the system
  • Explain what you want to achieve next
  • Tell the AI not to code yet, but to summarise what you’ve said, and then output a plan for greatness. Take a copy of this and be ready to paste it back into conversation if the context seems to be degrading.

Then every prompt or every few prompts, request that the AI replies by:

  • Explaining in detail what it’s going to do next, being clear about the logic, how a good solution should work and why it’s a great idea (this leads to better code quality in my experience)
  • Making the code changes
  • Recapping the plan and articulating what’s next

Going through this cycle of making the AI explain itself and recap the plan seems (seems, YMMV) to keep the bigger picture fresher in its most recent context and stop things getting all mangled. :knot:

PS: Also consider pasting it into a Notepad and including that in your context, because life is short.

6. TDD

You’ve heard of Test-Driven Development right? If you’re a CompSci type, you may recall it as that thing you feel like you’re supposed to love, because you can demonstrate that your code isn’t broken and it meets your functional spec. Enumerate your assumptions, write the tests, then write the code to make the tests pass.

It’s the same thing that on smaller projects will have many coders go “ehhhhhh… I just want to write the code: writing the tests is going to make everything take twice as long”. It’s also often tedious as ****. And of course many folks then develop a habit of skipping it, because life is short and who has the time, right?

BUT: AI!

This was a pretty big revelation when I started trying it out. For mid-sized projects, go through the initial descriptive tests, but don’t ask the AI to write the code for you, ask it to write the tests. It’s traditionally the boring, time consuming bit, the part that humans hate and AI is pretty darn good at (most of the time).

Make your initial conversations code-free, explore the topic and the system design, get the model to describe an ideal system for you, and then tell it to generate the code structure but with most of it mocked up or left as “TODO”. But then make it write the unit (and later integration) tests for you. :clipboard:

Then when you start to get into the code proper, not only can you point out the model’s failings by pasting the failed tests back at it – much easier than convincing it to understand your hand-wavy description – but also the AI always has the test harness to read, which is a great way to keep success criteria fresh in its context. It’s machine-readable, unambiguous and describes your desired system behaviour, and it’s always there to verify your assumptions. :ballot_box_with_check:

For me, taking this approach has enabled a lot of AI-assisted work that might otherwise have been too complex for straight-ahead coding.

Final thoughts

Phew, this turned out to be a lot longer than I expected. If you got this far, congratulations, have a GDPR-compliant cookie. :cookie:

One last comment about what the AI is doing for you, and then I promise I’m done.

It’s this: don’t forget that LLMs are essentially sophisticated automated con-artists with access to a lot of training information. Their only trick is to tell you what they “think” you want to hear, and they will do whatever it takes to keep up the illusion that they’re actually intelligent. :nerd_face:

It’s why you get hallucinations (for fun, try asking ChatGPT which specific episode of your favourite TV series has a particular character wearing a bright blue jacket – it’ll make up almost anything to sound convincing).

It’s also why telling them they’re experts kinda works. They will respond in a way that sustains the illusion, choosing words (and syntax) that would be consistent with an expert response. And why sometimes it can even help by telling them to go slowly and take their time. They don’t actually take any longer to respond. But sometimes you get higher quality answers, because they’re more in keeping with what a careful, methodical expert would say – again to preserve the illusion. :woman_mage:

Okay, I’m done for real. Hope this ends up being useful to at least one person using Cursor for AI-assisted coding, or at least gives some AI-related food for thought. If it did that for you, feel free to leave a Like, or maybe even reply with your own tips to help bump it up on this busy forum. :innocent:

PS: The title isn’t me being rude to you. The idiot is me.

Addenda et Corrigenda

  • Don’t say “yes”. Based on bitter experience, if the model says “Would you like me to go ahead and implement that for you?” and you answer “yes”, you have a non-zero chance of it leaping back to an irrelevant part of the conversation and trying to relive that moment. Always give a clear instruction like “yes, please go ahead and implement the changes you have just recommended” to save yourself the bafflement and burnt credits.
  • Do demand debug logging. If things aren’t working out properly – and especially if you get into the dreaded “Don’t do A, do B; Don’t do B, do A” loop – tell the AI to put in vast amounts of debug logging, and then paste those logs back to it wholesale. The models’ capacity for reading pages of tedious logs to spot the one tiny error is definitely superhuman. It’s usually a pretty good way to highlight to the model where it’s made an incorrect assumption. Sometimes even the act of making the model include the logging can cause it to fix elusive bugs, much like “talk me through your steps” works with a junior dev. :slight_smile:
28 Likes

This is generally true to all AI not specific to Cursor. I love how every time close to the end of running out of context memory it starts giving you the run around until you notice that is what is happening and if you don’t notice, lol, good luck circling the Earth.

3 Likes

Hahah so true! :laughing: I think most of the ramblings above generalise pretty well to other LLM use too (save for things like Composer vs Chat). On another thread here I think I described AI-assisted coding as sometimes feeling like wrangling a wet soap covered snake in a pitch black room, and you’re so right about that feeling creeping in at the context limit. Couldn’t agree more, spotting it early is vital! :grin:

could expand on how you write test before code before proper implementation?

i’m new to dev so this would be something i would love to test out

Sure! TDD is a whole art and discipline of its own, but the basic pattern you repeat is:

  1. Decide on the functionality.
  2. Write tests to prove that functionality: these don’t always have to be complex or completely exhaustive, they’re often just to try things out under different conditions.
  3. Mock up the functions (function bodies with like a ‘return null’ or something).
  4. Run the tests and watch them fail - this is a good thing, because it shows you didn’t mess up your test harness or break your testing code.
  5. Implement the functions.
  6. Run the tests again and watch them pass, or if they didn’t, revisit your assumptions and fix the code. Or if you’re really sure it’s a test issue, fix the test.
  7. Repeat.

Example

So for a trivial example, say you were building a machine to multiply two real numbers together. (I’ll use python-like syntax here since it’s pretty readable)

Step 1: We need a multiplier function, and we’ll call it do_multiply, and it’ll take two params. We need it to support negative numbers and non-integers.

Step 2: Write test cases

def test_multiplier_a(): # basic test
   assert do_multiply(3, 2)==6, "Oh no! We couldn't multiply three and two!"

def test_multiplier_b(): # negative first operand
   assert do_multiply(-45, 2)==-90, "Error multiplying negative first operand"

def test_multiplier_c(): # floating point example
   ... etc.

In Python, assert is a statement that takes a condition to evaluate, which you’re saying (asserting) should be true, followed by a message to display if it actually wasn’t true. In reality you can probably be more sophisticated than using assert – the AI models can help you to build something far nicer. Caveat: I’ve written this example here without testing it so typos are possible.

Step 3: Implement dummy version

def do_multiply(a, b):
   return 0 # whatever

Step 4: Admire your failures as your mocked-up do_multiply does nothing useful.
To see them in all their glory, you’d usually use some wrapper code to run all your tests in sequence, catch any exceptions and deliver them in a nice neat output format. You might also have some extra code to do setup (if your code needs any) and teardown (to clean up afterwards). Altogether this will be known as your ‘test harness’, and the AI models are pretty competent at writing this sort of thing for you.

Step 5: I’d give sample code for how to multiply two real numbers, but it would just be so complex :laughing:

Step 6: Re-run the test harness, and see all your wonderful successes.

Step 7: That was so much fun you’ll want to do it all again.

Isn’t that a lot of effort though?

It probably seems like it with such a trivial example, because we’ve written more test code than implementation. But at scale, when you have functions that are dozens of lines long, and you can check them with a couple of lines, it’s not so bad. And now in the modern age, it’s even easier, because the AI models will write the tests for you!

I mean, you still have to read the tests back and check that it hasn’t hallucinated in the writing of them, otherwise you’ll get misleading results. But that’s a lot less heavy lifting than doing it all manually.

Also definitely get the models to write your overall test harness, because they’ll do that in the blink of an (A-)eye.

Okay but where’s the benefit?

  • You have a clean, machine-readable, ambiguity-free set of tests that you can use to slap the AI’s virtual face with when it gets things wrong in implementation.
  • You can also spot your own mistakes the same way, but elect not to slap yourself
  • When you work on new stuff (and this is the biggie) you can re-run all your existing tests and verify that nothing broke.

One of the most frustrating things with AI-assisted coding is when your Module A is working, and you move on to Module B, and by the time the AI’s done twisting your code to meet your Module B spec, you find that Module A got broken somewhere and you didn’t spot when it happened. With TDD, you can re-run your tests on every significant change, and make sure nothing gets silently corrupted while you’re looking the other way.

Summary

It takes a bit of getting used to, but with a little practice you can develop a habit of encouraging your AI (e.g. in Composer) to begin with the test harness and work with TDD. The models have all heard of it, so it’s nothing too esoteric to ask for. Try it out on some scratch test projects first and you might just fall in love with it.

Hope this helps. Have fun!

4 Likes

Thanks for the elaborate breakdown. I’ll have to give this a shot.

1 Like

You’re most welcome. Good luck with your coding adventures and have fun :smile:

Excellent post! Thanks for sharing, I will apply your tips. :+1:

1 Like

What is better give it steps vs let LLM decide the steps and the order? So far I prefer to tell it the steps so I can get to the checkpoints in the order that I want

1 Like

I think whatever works for you really. It’ll probably depend a bit on your project and how many moving parts there are.

From my experience, the LLMs (especially sonnet) seem pretty competent at setting out a plan, though sometimes it can get a little ambitious – okay, just write my script, don’t re-architect the entire internet…

Generally though if I’m going to ask the LLM to design the steps, I won’t let it plough ahead right away. I’ll ask it to write the plan without changing any code, and then I’ll go through and choose what I agree with and what I don’t, maybe debate it a little (not too much, credits are precious) and then I make sure I’m the one controlling the order of operatoins. Often after writing a plan (if you tell it not to touch the code) it’ll even ask if you’re happy and what to concentrate on first/next.

So in summary, personally I mostly let the LLM propose the steps, and then I control the order and any refinements, to get that control you’re talking about.

Writing tests before the code is great idea. But in complex cases it’s not viable. What would you write to test React UI for example (before it’s created I mean)?

Also I find longer Composer threads more useful. It usually goes sideways in the beginning, but more examples and explanations you feed to it the more on point it becomes! That’s actually why I believe that Cursor should simply obtain a memory about all the things we did on the given project from the start. Basically remember or be able to navigate through all previous Composer threads (would be much more useful than referencing static Notepads). I think if it stopped forgetting what we did previously in every new Composer thread, it could do much better job.

As I say, your mileage may vary, so feel free to take as much or as little from the above as suits your uses. But actually I’d argue that it’s sometimes even more useful in those complex cases. You can definitely use TDD for a lot of UI work.

To use your React UI example, let’s say you want to add to your interface a component for an on/off toggle button (simple example, but it does generalise). You can use libraries like Jest to assist with the tests, or get the AI to build your own custom testing framework. Tests first:

// ToggleButton.test.js

import React from 'react';
import { render, screen, fireEvent } from '@testing-library/react';
import '@testing-library/jest-dom/extend-expect';
import ToggleButton from './ToggleButton';

describe('ToggleButton Component', () => {
    test('renders with the label OFF and toggles to ON when clicked', () => {
        render(<ToggleButton />);
        
        const button = screen.getByRole('button', { name: /off/i });
        
        // Initial state should be "OFF"
        expect(button).toHaveTextContent('OFF');

        // Click the button and expect it to change to "ON"
        fireEvent.click(button);
        expect(button).toHaveTextContent('ON');
    });
});

Full disclosure, example part-generated by AI, because the models are quick at this sort of thing.

Then all you’d have to do is write the minimum code to pass that test, and there’s your verifiable React UI toggle button. Every time your test suite runs, you can be certain the AI hasn’t clobbered your button in pursuit of some other goal.

And if you fall in love with testing and decide to take the paradigm further, again with AI help, you can integrate things like Cypress for full end-to-end interaction testing. That’s where it gets extra powerful, because you have your logical verification at the component level so you know all the individual bits of your UI work, and you have more complex tests on top to verify your flow.

Like I say, YMMV; if it’s not for you, it’s not for you. But maybe give it a try just to be sure – the AI’s pretty good at it and you might just find you love it.

As for the comment about longer Composer threads, you’re braver than I am then! For me, long threads almost always involve wasting far too many request credits.

If you’re finding it’s going sideways early on, you might find it’s not picking up enough context to start with. Always try to give it that up-front information (see #3 and #5) and if you’re working on your existing codebase, don’t forget to add all the files that are potentially relevant into the context before you start.


Every time I forget to do this, I wonder why the AI’s being so dense, and then I realise that I am.

You mentioned how the more examples and explanations you give it the more on point it becomes - that makes a lot of sense. I’d just try to front-load as much of that information as possible, get it in there right on early, first query.

When you start the session you’ll have a fresh new wide-open context to fill with insights, so don’t hold back. If you wait for a back-and-forth debate before you reach a common understanding, it likely doesn’t have enough to go on at the start, and that back-and-forth will buuuurn through your usage credits in no time. Go heavy early.

If you do that, you’ll likely be more credits-efficient and reach what you want sooner… and then when you do get to really long Composer sessions, you’ll probably start to notice that degradation as your initial context starts to become a blurry distant memory. That’s when you’ll want to start another one fresh, rinse and repeat.

As with all the above, take from it what you like, discard what you don’t; refine your own favourite best practices for maximum coding joy :slight_smile:

1 Like

Incidentally, my comments about Composer vs Chat in the original post weren’t intended to imply that Chat isn’t useful. I think it’s very useful, especially for asides, and where you don’t want to pollute the context of your Composer session with a tangential or unrelated query. :slight_smile:

@three do you know about .cursorrules file and it’s ability to act like a system prompt?

1 Like

Thanks @v-karbovnichy, that’s a great point to raise!

Have you had success using this with Composer? I didn’t have much luck with it back in the pre-Composer days, and then since the docs only mention “Chats and Cmd-K sessions” I’ve never really revisited it since.

Personally I think I’d probably still prefer the flexibility of tuning the focus on each new session, but if you’re using it successfully that would be good to know, some folks might prefer it as an approach. Maybe we can even ask Team Cursor to update the docs in that case! :slight_smile:

Or just use Go and build your project out in modules that are relatively isolated from each other outside of your main.go. Isolate by package and main for cursor and decrease context requirements for generating code.

1 Like

Hey @WhatIsAModel - you raise a great point! It’s one that I think extends beyond Go to a lot of languages: more modular design.

Sometimes it’s a lot of extra effort to convince the models to leave alone the bits that you don’t want them to modify. The other day I’d put in a couple of hours making sure all my code for something was neatly encapsulated and decoupled, all the right bits owned the right data. I asked for a tweak from the AI and the first thing it did was go and start writing hard-coded “special cases” in the parent module that said “if the child is this, then do that”. Super ugly.

The larger the system, the more likely we should still rely on “classical” techniques to separate concerns, and I love your observation that that may mean keeping modules out of context altogether. Sometimes I think it may make sense to expose only the interface to the model, not the implementation.

I’m going to give this some careful thought and might add an extra point about it to the top post. Thanks for bringing this up!

1 Like

Still experimenting on the best way to recommend an interface-driven highly decoupled approach. But I realised I forgot something ridiculously important in the guide above: Notepads. I’ve added it in.

Thank you for these fantastic tips. With regard to Composer, would you say it should be primarily used for brand new projects (for scaffolding or immediately after scaffolding)? Or is it also suitable for existing, large codebases?
For TDD, you used Python as the main example. Would you say asking Cursor to generate tests with a library like PyTest or similar is the way to go? I often prefer to write tests without a framework, because it has less boilerplate and makes it easier to debug. But of course, it’s not as … systematized.

1 Like

Thanks, you’re most welcome! As I say that stuff is unscientifically cobbled together from my experiences, so if you find yours differ, please do share and I can update the post with any additional insights.

For new vs existing, it seems pretty capable with either (especially with the most recent sonnet update). When you’re starting a new Composer session – which I’ve recommend doing before you get to context limit issues – you’re sorta kinda effectively starting again with an “existing codebase” anyway, because your files won’t be in the conversation context at that point.

I tend to start a new Composer session by insisting it thoroughly reads and then summarises the source. Might be placebo effect, but it does seem like having that summary available in the context helps it to stay on track during subsequent exchanges. I’m not sure how much of the “read the codebase” is being condensed by Cursor’s own models and how much is being done by the main LLM, but either way it seems like having that “crib sheet” reference handy does no harm, and probably helps.

As for TDD, I’d say whatever you’re most comfortable with. Despite my exhortations not to be lazy about getting your test coverage in, I’ve mostly been… well, lazy enough not to insist on 100% coverage, using it more as a means to keep the models on track. So I’ve mostly been frameworkless too and it’s been working fine for me.

I think it depends somewhat on the project too. If you’re coming up with your own for-fun small-to-medium hack projects, you can pretty much go frameworkless and rely on the model to set up something “good enough” for executing your harness. Although bear in mind you probably will burn a few credits on refining the harness first - things like:

‘this test failed with an exception but you said it passed’
‘You’re quite right and I apologize. We were catching the exception silently and not updating the results. Let me not screw that up for you this time.’
or
‘when any test fails, the whole of the rest of the harness stops executing’
‘You’re a master of observation and I appreciate your world class attention to detail. Well done for spotting something so subtle. Let me refine this minor detail for you now.’
… or whatever. :smirk:

Once things are past that stage, homebrew has been just fine for me. I’ve been mostly Svelte-ing recently, so having a front end page that runs all the tests and displays them beautifully has been very nice.

BUT

On the other hand, if you’re putting together a serious project where you might want to share it with the world, or control a nuclear power plant with it*, or care about people peering at your code and going “ewww don’t you knowwww how to use a testing library?!”, then you may find the extra effort to show your code coverage badge is worth it. Those kinds of libs do at least tend to give you numbers on coverage, so there’s a demonstrable benefit there.

As with all of this thread, there’s no substitute for trying it out and finding what gives you the best happy feelings, so always take my musings with a pinch of salt. But that’s how my work has been turning out recently. Hope that helps!

* please do not do this.

1 Like