NOTE: LONG POST!
I first started using Cursor around four or five months ago. First time using it, its kind of magical, seeing what it can do and how easy it seems to prompt the agent to create complicated things. I started with creating Angular and Next.js web apps, and with Next.js in particular, its still pretty amazing how the agent can tackle very complex prompts and do a wonderful job.
Over time, and not all that much it seemed, the novelty wore off, and I settled into the job. Working the job, and while what Cursor and its agent and the llm can do is still amazing, its not quite as amazing as the initial take. Especially with backend code, where architecture and design matter a lot, and when you have very specific needs, particularly when your codebase gets past certain thresholds (even if its all agent-generated code), the challenge of keeping the agent and llm on the rails and on task, without them going “scatterbrained” and bulldozing your codebases all the time, definitely increases.
To combat the challenge, I started operating in a particular mode: Plan, Research & Refine, Act, Complete, Terminate. Or PRACT, for short. ![]()
Planning is becoming a much more significant part of the process, as is refining the plan, getting it added to my ticketing system (Linear) via MCP, often writing out a local plan .md document for local reference and use. Plans are usually multi-phasic, or multi-task under a story in Linear.
Acting, is generally just me guiding the agent through a plan, either by working through a Linear Epic and child stories, or by working through a multi-phasic plan one (or maybe a couple) steps at a time. I may involve multiple chat tabs while acting. Having one actively working through a phase or two, another I’m preparing to act on the next, a third is often open for research or review tasks related to the work being done or other sideband things (reviewing someone else’s work.)
Completion, often comes iteratively as well. After I act, I’ll “complete” by tackling the mundane stuff like fixing linter errors and fixing formatting, running and verifying (and fixing) tests, etc. Then I’ll commit before moving on with the next “Act” step for the next phase of the planned work.
Termination, is where I’ll clean things up and get ready for the next big body of work. Closing out chat tabs, terminals, files, etc. to just clean slate so that junk from the last body of work isn’t hanging around confusing me or interfering with the next body of work.
OK! So, I now have this process. Its becoming fairly engrained, as keeping the agent and LLM on the rails, on task, without meandering or bulldozing all over my codebase, is a more challenging task when you actually have a decent sized code base. Anyone who’s done agentic coding for a while should know, you can go from zero to quite a lot, rather quickly!
A couple weeks ago (maybe not quite), I was sitting watching Claude 4 Sonnet “think” for rather long periods of time, before it would actually do any work, then think some more, then do some more work, etc. etc. I was poking around Cursor’s docs pages, and came across something where they stated that “Thinking models cost twice as many requests” or something to that effect…which suddenly gave me REAL PAUSE!
I have just had my first REAL month (and a half, almost) of really HEAVY agent-centric work. I do EVERYTHING in the agent now. I burned through all the requests Cursor Pro offered, then started burning pay-as-you-go tokens. Then upgraded to Cursor Pro+ when I discovered it was an option. Burned through ALL of that it seemed in less time than Pro! Ended up racking up well over $100 in additional paygo costs. Then I discovered that I was effectively paying for, for all intents and purposes, HALF the number of requests I thought I was, because all my work was using a thinking model! I ended up upgrading to Cursor Ultra plan last week.
So I started wondering if I should switch to Cursor 4 Sonnet NON-thinking. I figured, why have the model “think” when I am already doing a lot of thinking? Further, I’m a heck of a lot smarter than an LLM. The only advantage an LLM actually has is knowledge. Why would I rely on an LLM to “think” (an act it really actually isn’t capable of, its really just a feedback loop) when I have over 30 years of programming experience, 27 years of in-industry career experience?
So I switched to the non-thinking model. Whoa! Suddenly it seemed like things were happening so much faster. I’d write a prompt, and bam I get an answer, right off the bat. I might write a more complex prompt that required some grepping or other investigative file work or git history trolling. But, it just HAPPENED. Right off the bat. Beyond the initial “Generating…” lag (which I believe is the agent itself preparing the full prompt+system prompt to send to the LLM, or things of that nature), when the model was actually invoked, it just responded. Boom. Bam. Work done! Next?
Another thing about the non-thinking C4 Sonnet model…it did NOT seem to do a worse job than the thinking model. On the contrary, it just seemed to do what I asked, with less deviation or meandering from the specified task than the thinking model. Given that I plan out in a fair amount of detail what I need done, having the model just do what I asked and get it done immediately, was really nice!
I’ve been working this way with non-thinking models for, about, maybe not quite, two weeks now. Well, at least, I was, until the last two days and GPT-5. Once I started playing around with GPT-5 (which was an unpleasant experience for other reasons, which I am honestly not sure whether to blame the model, or Cursor 1.4…they both seemed to coincide, and something seems…not quite right ATM), the whole “thinking” processes came back. GPT-5, when it acts, is a bit faster than C4 Sonnet. However, all the “thinking” periods counteract that, and I once again feel like I am being throttled by thinking. I am not really a fan of that anymore. The novelty of seeing your model “Think” as COMPLETELY worn off, and it now just feels like wasted time and lag.
GPT-5 is “free” right now (or supposed to be, read a number of reports here that it seems to be billing some people), so the “thinking models cost twice as many requests” factor shouldn’t actually be taking effect…BUT, it is still, as far as I know, a factor! You pay one “request” for a thought cycle and then you pay another “request” to get the actual output. At least, that’s what the documentation seemed to be saying. I don’t like that much at all! Especially since all the “thinking” doesn’t really seem to actually enhance the results? At least, the difference between C4 Sonnet “Thinking” and C4 Sonnet, seemed to be that the thinking model meandered more, and would go off task or not quite do EXACTLY what I needed more often than the thinking model.
So I’m now in a conundrum here. Is a thinking model, actually, truly…better? Perhaps it has to do with how you work? My PRACT(ical) approach involves me doing the thinking and researching and planning and refinement of the plan of action I want the agent and model to enact. Perhaps that has negated the benefits of a thinking model?
I am curious if any of you other heavy Cursor Agent users out there, do anything similar to me with the self-directed planning process, and what your experiences with thinking vs. non-thinking models might be. It looks like ALL GPT-5 models are “thinking” models. I am not sure what I think of GPT-5 yet (maybe the issues are just Cursor’s initial integration with it), but…it does give me pause.
My current experiences are that the thinking models meander and bulldoze more than the non-thinking, the thinking models are clearly twice as expensive (you spend twice as many requests using one!), the “thought” processes slow things down, which when you are already doing your own planning, can really greatly slow things down. I think it would be sad, if all new models were ONLY thinking models, as I find that they may not be as effective or fast as a non-thinking model, at least under certain use cases or circumstances…