A cursor-small agent

the current models available to agent are great for ‘big hauls’. but it would be nice to be able to switch to non-premium models such as cursor-small for very small requests from the agent when i don’t want to use up my premium calls on something like the small request.

1 Like

While this is definitely doable, I think in production it might be less practical than it seems.

For two primary reasons:

  1. Agent calls often require a lot of thinking and iterative steps, which lead to rapidly increasing chat history/context. Smaller models struggle with this more than larger ones, having difficulty focusing on what’s important.
  2. In an agentic workflows, things often operate in chains. So small differences in error rates compound into large differences in the final results.

As an example, say that Model Big A has an error rate of 5%, and Model Little B has an error rate of 10%.

That means the chance of Big A being right on one LLM call is 95%, and the chance of Little B being right is 90%.

In one LLM call, not a huge difference.

But if you chain that over 5 steps:

  • Model Big A’s success rate: ( 0.95^{5} = 77.4% )
  • Model Little B’s success rate: ( 0.9^{5} = 59.0% )

Or over 10 steps:

  • Model Big A’s success rate: ( 0.95^{10} = 59.9% )
  • Model Little B’s success rate: ( 0.9^{10} = 34.9% )

You can see how small differences in error rates compound over multiple steps, resulting in a large difference in the final result.

After 10 steps (very easy to hit in agentic workflows), the 5% difference in Little B resulted in ~58% the performance of Big A (34.9/59.9).

This is obviously just a conceptual example, but it shows how important accuracy is as you chain prompts.

With all that said: as SLMs get more powerful, this will become increasingly viable. So it’s definitely something that could be in worthwhile in the dev pipeline in the medium term.

I figured that was the current reason, and understand it. My very small refute is that, while it isn’t exactly equivalent, for these chains of small jobs, I’m currently using chat which is working fine except I have to manually copy everything over myself. So there’s a bit to suggest that the small model can handle this type of workflow in the right setting. Maybe there’d be a way to put configurations so that you put extra guardrails when a small model is running compared to a big model? (e.g. only enable YOLO for larger models.)

That’s a good idea. I think this will be possible within this year, given the rapid progress of SLMs (e.g. Microsoft’s recent Phi-4 announcement).

1 Like

A proper sandbox account environment is needed when working in the paradigm of AI_Agentic_Compute, in that, you’d be well to have:

Prod account - Primary full_power Dev/DevOps account one uses - but then a separate account that is the sandbox accoutn and it has a COPY of the codebase and a MIRROR of context - and you can run sandbox what_if against the same code, with the same CONTEXT - yet in sandbox mode where its nerf’d and cant upstream anything.

This would keep the request budget separate, and more interstingly, trackable and quantifiable/vlauation on certain aspects of interaction with Agent, outcomes of Curious_Ponderance “wouldnt it be cool if…” and “ooh - now add ALL THE THINGS!”

I’ve personally tested opening two separate Cursor Sessions, to the same folder, then had the Cursor_Session_02 read the .vscdb tables that Cursoe_Session_01 is putting into the vscdb, hence being technically able to MIRROR context straight from the SQLite vscdb – you can ask specstory’s implementation on how its accessing the vscdb…

But this type of setup needs some thought and needs to be bullet_proof’d…

I was looking at how to do Agentic_File_Locking originally - but never went ot bottom of that rabbit hole yet…

Also, .cursorignore has been seen to not be 100% trustworthy…

So the combination of all these Cursor Attention Focus Rules Projects in the next few months will be needed…