A tip for coding with the agent. Getting better results

I’ve been using Cursor since spring now. I’ve had a number of months with it, and I’ve learned a lot. I learn more every day. As s tip for those users out there, who seem to be struggling.

Pay close attention! If you are vibe coding, the inclination is to just let the “AI” do it all. However, the fundamental problem is that “AI” is actually not intelligent at all. Well, not, at least, when it is an LLM that we are really working with. If you think that “AI” will just solve all your problems for you, by interpreting vague prompts and little context, you are going to hurt yourself more than help, and you will ultimately spend exorbitant amounts of time (and tokens, thus money) resolving all the problems that ultimately arise in your product.

So instead of just tossing out a lightweight prompt and then stepping back… Don’t step back. Lean in. And first, for your prompt. A sloppy prompt will lead to sloppy results. Initially, the capabilities of an agent+llm powering your vibe coding efforts, seems really amazing. Then the novelty wears off, and you end up running into the endless slog, of trying to make a real product out of that vibed out code. Lean in, don’t lean back.

Prompts need context. They need details. They need YOU to bring the intelligence. I’m coining a new term. BYOI: Bring Your Own Intelligence! “AI” is pretty much anything but intelligent! LLMs are ultimately very fancy, very advanced knowledge bases, but they are NOT intelligent, not in the least. They are not cognitive. As such, they cannot actually produce REAL problem solving.

So, BYOI, and deliver the intelligence yourself. Think about your prompts, craft them better, and keep striving to craft them better than before. Always attach the relevant context. ALWAYS attach relevant documentation (look into the docs indexing feature of Cursor!!!) Docs are a LIFESAFER! ESPECIALLY if you don’t know what you are doing. Docs totally EMPOWER the agent, and make it significantly better! As you get deeper into larger and larger volumes of code, prompt craft becomes more and more important. Start thinking about things like SCOPE: What areas of the code, should the LLM be touching? What areas should be totally off limits? If your prompt has these, it will help corral and contain the LLM’s activities. If you think the LLM isn’t going to know the best answers off the top of its neural network, instruct it to search for the most relevant details of X, Y, maybe also X, online. Keep pushing your prompt craft.

Lean IN!

Once you have crafted a good prompt, stacked with the necessary context, docs, search requests, etc. Lean in again! Watch what the agent and llm are doing. Keep an eye on things. Pay particularly close attention to certain commands, especially file deletions or “reset” attempts where the agent+llm want to backtrack heavily. Backtracking may occasionally be necessary, however when it really truly is…it is often better for you to find the previous prompt, and REVERT to it (and revert the changes), then refine your prompt more, include more relevant context, and then try again. Letting the agent try and “backtrack and fix” or “backfix” as I call it now, is often a recipe for monstrously token-wasting disasters. Its often much more efficient and effective, to FIX THE PROMPT, than try and have the agent figure out where it went wrong.

However, there are often other opportunities to guide the agent along the current path, but in a slightly better direction. Sometimes the agent will…try to delete a file, that it thinks became too complex or complicated, and try something else entirely. Or maybe it tries to run a command that is not in your allow list, and the command seems…suspect, or sketchy. These are OPPORTUNITIES for you, to take a closer look, and step in and nudge the process if necessary. Nudging, with the right prompt, is often a better solution than backfixing or even reverting.

Lean IN!

Keep an eye on the work you are having the agent do for you. You will find opportunities to reduce cost, save yourself time, tokens and money, if you are at least somewhat attentive, rather than just purely surfing the vibes.

EXEMPLAR CASE

The event that spurred on this post, was this process I’ve been watching claude-4-sonnet :brain: going through here, while it worked on a prompt (and not that complex of one overall, although it definitely had the necessary context attached). The high level goal, was to implement an integration test suite for a new set of controllers, API services, business logic services and data services. The only thing I wanted mocked, was the DB layer (Prisma in this case), otherwise, I wanted ALL the rest of the REAL code to be integrated in these tests, so that the full integration was evaluated. This is a Nest.js app, TypeScript, modules, dependency injection, all that good stuff.

Overall, I thought things were going quite well. The prompt gave sonnet what it needed to fully understand the problem. However, after about 5-7 minutes or so of working on it, it started to question itself. I wasn’t questioning it, every step it took, every thought cycle, looked like it was on the right track to me. But it suddenly decided to just delete the entire integration test file. That was about 700 lines of effort in and of itself…a decent amount, especially considering not all the test cases I wanted were implemented at that point.

Well, deleting a file, is one of those opportunities where the agent PAUSES. So, instead of either rejecting or accepting that delete…I crafted another prompt instead. I had an idea. MY ACTUAL INTELLIGENCE was able to see something the model could not (and perhaps, with the right context, or a more advanced prompt, it might have, but it did not).

The problem the agent was having, was getting a full integration test working, there was actually a lot of other code that needed to be involved. Lots of other modules, spanning a fairly broad swath of the overall product in totality. The agent kept finding more, and more, and more functionality it needed to include. After a certain point the model, agent, decided the problem was too complex. Honestly, I don’t blame it, however…there was a solution (one I’d implemented myself in another life): Build a tunable, configurable integration-test.module that effectively served as a stand-in for the app module (which brings in far too much), brining in all the “boilerplate” that was necessary to get the fundamental infrastructure working for fully working Nest.js controllers, APIs, and http calls with full requests/responses working. On top of that, there were some custom modules I’d created, as well as all my domain related modules, that also needed to be included.

So rather than allowing the agent to delete the entire integration test suite it had just worked so hard on, or just rejecting the delete and leaving the next action up to the arbitrary nature of the model, I instead instructed the agent to switch tack towards making a reusable integration testing module, with some configurability, and maybe a helper function to get it set up. The model thought for about 8 seconds, thought the idea was brilliant and solved its previous predicament, then it proceeded to take my relatively simple idea, crafted in a very simple prompt, and it created a richer system for generating modules for integration testing Nest.js controllers end to end, but with the ability to mock Prisma (the one layer of code I explicitly wanted to be mocked.)

Ultimately, all the complexity that the model had become concerned about, dwindled into this one, simple, line of code:

const TestModule = createIntegrationTestModule([UsersModule, UsersDomainModule]);

Sonnet had actually already reduced the complexity of setting up complex Prisma mocking, into a simple, single function call as well:

mockPrismaService = createMockPrismaService();

This provided a baseline mock, which then could be further tuned on an as-needed basis in each test case, to account for each test case’s unique expectations.

So LEAN IN. Don’t just take the default or expected action, either. In particular, when the agent/model want to delete a file, that in particular, should be your top cue, to look more closely, figure out why it wants to do so, and think about if there are better alternatives. In my case, I chose to completely ignore the attempt to delete the file, and wrote a new prompt instead (effectively canceling the delete.) That prompt has lead to a very elegant solution to performing full integration tests of my controllers, via actual http calls (vs. method calls) to test the actual real-world scenarios, as my web and mobile apps would be doing.

1 Like

On my word!!! Everything you have mentioned here has taken me on an 8 month journey, where documentation has helped me as a non-coder/programmer to Lean In and recognise that my own intelligence is vastly more astute than LLMs. I do appreciate that coding and Logic are not my specific brand of intelligence, however design, attention to architecture and infrastructure clearly is. So what works is making sure that I have a checklist before I start and something that helps me to keep track of what I’m doing.

The step by step approach stops all that ‘feral activity’ and distraction that LLMs seem to love.

I’m not the type who loves the whole ‘development’ side of things. Frankly I see it as a waste of my precious time and so what I have learnt over these 8 months is how to build my own server, how to work with a hosting company that actually allows me the freedom to go straight into production and testing! And I use Cursor’s workspace modality to add all the parts of my overall project. It’s then all in one place and the LLM is then able to understand the full context of my Stack and Codebase and overall project.

But I won’t lie and say it’s been a breeze, because frankly, the level of frustration has been unimaginable! Things I consider to be very simple are taken to a whole new complex level.

@jrista thank you for sharing your experience and wealth of knowledge. :+1:t4: