10 Pro Tips for Working with Cursor Agent

Here are pro tips on how to use Cursor Agent, add your tips below and share with the community.

6 Likes

Good video. The custom commands tip was pretty interesting, especially since it’s appears you can write a prompt with these commands embedded at different places. Like copy and pasting common prompt segments. Will have to pay attention for opportunities to use that.

If you are trying to be frugal, using sonnet 4.5 thinking for something like “make the numbers smaller” is wasteful. GPT-5-mini or Grok-code-fast-1 could perform that request for free essentially.

The video shows them using sonnet 4.5 thinking for every request while being on the $20 plan. :sweat_smile:

2 Likes

@MidnightOak That does make sense, though switching a model would cost more tokens than just staying for one followup in same thread.

1 Like

Interesting. Does it cost more token to come back to sonnet 4.5 thinking after say using gpt-5-mini for a couple requests? Because maybe it would benefit from duplicating the chat first or making an entirely new chat for these small tasks. I suppose even the text size request may have been negligible token usage for sonnet 4.5, but if the request had required more work but was “simple” still, the saved tokens from switching to a more inexpensive model may be more pronounced.

Any change of provider (Anthropic/OpenAI/xAI/…) requires full thread to be sent to the new provider as input tokens, as the advantage of cached tokens only exists on the last model provider. On short chats it may not be tragic but for longer threads it would unnecessarily cost more.

New chat for small tasks if they are not related to current task.
If it’s part of current task then definitely keep current model.

4 Likes

Then, I really wonder when using auto, if inevitablly changed a model, does it mean it inevitablly add a great cost(tokens) when a model is changed first?

@dbsx It usually stays on same provider during a thread.

Oops :grimacing:

Auto mode rarely changes the model. If there were a unified cache, Auto would be ideal for Cursor.
Unfortunately, Claude model costs remain the same across generations, and they can’t switch to Haiku in Auto mode for obvious reasons. Perhaps with the release of Haiku 4 or 4.5, this might make sense.

I think I’ve heard something about automatic “diff” acceptance.
I’ve also heard about “multi-edit.”

With a proper implementation of these features and the right model configuration, the AI wouldn’t need to pause over trivial matters or spend calls on agent actions.
The remaining challenge is to make sure automatic diff acceptance is synchronized with the cache.

If I’m off base, I may have misunderstood your point.

When I said diff, I mean after the agent makes edits, you can choose to “Keep” or “Undo” them.

1 Like

I meant that whenever any file editing tool is invoked, continuing the dialogue counts as a new AI call with the full context (cached).

I’ve heard there’s a technology where the AI would edit files in agent mode not via a tool call (like edit_file), but through “diff” with auto-acceptance (as if edit_file had been used). Potentially, this would let it maintain its output flow uninterrupted and use fewer tokens.

Part of this issue is addressed by the multi_edit tool, but the AI rarely uses it, even when explicitly instructed.

Great tips, thanks @condor!

1 Like

It may depend on the kind of work you are doing, but, I don’t fully agree with using the smallest number of tokens possible. There are cases where that works, but, context is CRITICAL and not having enough is often the key reason why many agentic coders fail, or end up with disastrous results. I have found, over the last six months here with my usage of Cursor, that sometimes, trying to keep token usage as small as possible is also the best way to get the worst possible results. I’ve had numerous cases over the last, two months or so, roughly since about the time that the Chat Summarization was first introduced (a little before), where trying to create new chats regularly and keep context small, was actually disastrous.

The more complex the task becomes, and that doesn’t necessarily mean that there is a high volume of code…I’m talking complexity, such as mathematical, conceptual, theoretical, technical complexities…the more complex the task becomes, the more critical having the right context is. With Chat Summarization, with complex tasks, I have found that the agent is significantly more effective and produces correct code (vs. just breaking code or producing useless junk). I’ve had a few chats, that I had going for around a week, because there was a non-trivial effort to provide all the necessary context initially (docs, web sites, theories, math formulas, etc.), but then also keep track of CHANGES to the context that mattered and were fundamentally critical to the agent being able to continue work on the same code (which was not that much, a postgres UDF that was about 500 lines in total).

So I would be careful, with the frequent addages that “aim for the smallest context possible” or “start new chats frequently” are really the best practice. Context, which is agent “short term memory” for lack of a better term, is often the single most critical piece for the agent to BE EFFECTIVE.

Again, this may depend on the kind of work you are doing, but I find more and more, that my chats live longer, as I get better results. Chat Summarization completely changed my approach here, and there are certain limitations that you can eventually hit (technically, summarization is lossy compression), but there are also ways to get things back on track again. There ARE indeed certain classes of tasks, where you don’t need a lot of context and you can start new chats frequently. I find this more the case with frontend apps, especially Next.js. However, backend work, database level tasks that involve the various complexities of those tiers, context rules, and starting new chats prematurely can be a death knell.

Well smallest can also mean smallest suitable for the task :slight_smile: