Getting frustrated w bad answers on Ultra Plan

Peter_Jaeger · August 31, 2025, 8:44am

At a certain point in time the models seem to start smoking bad ■■■■. Some earlier (GPT-5) some later (Claude). Super easy prompts are completly missunderstood and executed badly. Time to fix things from bad answers takes than more time than getting the fix done in notepad w/o Cursor.

Do we have to start a new chat at a certain point in time - although work is not completed? is it because of changing models during chat? or do we have to slot in more money (happy to do so if models stop to smoke bad ■■■■)?

condor · August 31, 2025, 9:37am

hi @Peter_Jaeger Yes context adds up over time and may confuse any model. At that point it may be better to start a new chat.

Both Sonnet 4 and GPT-5 have limits in terms of understanding conflicting information or filtering out related and unrelated information.

This is not a cost related effect from our side since we charge usage as reported by AI providers.

Here are my recommendations but feel free to share more about your setup and how you use Cursor. Happy to give more focused help.

Separate task planning and task implementation.
Example:
- Use GPT-5 for planning, incl. writing an implementation plan or FRD as .md file
- Implement changes with Sonnet 4 by using the implementation plan or FRD
Use focused and short rule files (avoid conflicting info or many negative statements)
Use Auto model or less performant models for simpler tasks and saving cost (committing, checking logs,…)
Keep chat focused on a single specific task.
Any follow up tasks should be done in a new chat.
You can use @ to reference previous chat if you need some info from there for better focus.
Note that models may ‘hallucinate’ e.g. produce logically sounding output that is inaccurate. This has improved over time but may still occur.
Avoid attaching files directly, let Agent know which ones are relevant by just mentioning the file name.
Keep rules focused and short.
Follow programming best practices and avoid preferences which go against those.
If AI makes mistakes it can be beneficial to go to previous step where AI went wrong and adjust prompt there to avoid the mistake instead of continuing in chat and directing Ai to correct the issues as the wrong direction was already taken and may further influence results.

user477 · August 31, 2025, 9:50am

If AI makes mistakes it can be beneficial to go to previous step where AI went wrong and adjust prompt there to avoid the mistake instead of continuing in chat and directing Ai to correct the issues as the wrong direction was already taken and may further influence results.

A big second to this. Avoid any offers to ‘revert’ at all costs IMO.

Peter_Jaeger · August 31, 2025, 10:37am

Thanks @condor most of it I/we are doing already. The hallucinating answers are usually still in bounds of used context (way less than 200k). but you might be right on the length of the chat. sometimes it also ignores rules set before (like lovely liniting…).

With this one > If AI makes mistakes it can be beneficial to go to previous step where AI went wrong and adjust prompt there to avoid the mistake instead of continuing in chat and directing Ai to correct the issues as the wrong direction was already taken and may further influence results.<

you mean reverting and changing the prompt used ?

condor · August 31, 2025, 10:41am

Yes I mean going to previous prompt and therefore reverting changes on submit.

Benefits:

Reducing context size
Avoiding having unwanted reasoning/logic and code in context that may further confuse models.

the1dv · September 1, 2025, 12:50am

So I have seen this happen in multiple ways:

GPT5 - I am almost convinced that this is some kind of router setup that redirects queries to a different model based on the context, I have been working happily with it at low context length for it to suddenly have a huge change in personality and start changing everything (documentation style, code style, disagreeing with what it had written the chat before and then rewrites it in a completely different style for the same functionality) - there is something screwy about this model whether its Cursors implementation of it or just how the “model” works.
Claude 4 - Similarly I have noticed that at certain times of day (9pm onwards for me in Australia) - this model suddenly loses a bunch of its capabilities.
I quite happily sit chugging along around 300k context usually, and I manage context in other ways (FRD, PRD, Implementation tracking etc), then as soon as it gets to a particular time - boom - it’s lobotomised.

This again could be Cursors implementation or it could be the model provider - I cant tell but it has got to the point where I dont use GPT-5 and I basically stop using Claude 4 after 9pm for any major changes.

Peter_Jaeger · September 1, 2025, 5:08am

Funny to hear @the1dv that you stop using Claude after 9pm Aussi time. As you are 12h ahead of me we seem to have the issues at a very similar time. When I restarted the work yesterday at around 3pm GMT+1 with same prompts I got way better results up until like 10pm when the models start smoking again…and on GPT-5 I made the observation that when using GPT-5-High-Fast the prompt results are 10x better than from GPT-5 although it should be same model just “using OpenAI’s fast priority processing at 2x the price.”

btw: I’m using easy JavaScript/React/Nodes/Tailwind.

the1dv · September 1, 2025, 11:29am

Theres definitely something going on with it, GPT-5 is fine if you can one shot something dont need to maintain context, it just switches personality so quickly its hard to catch it

condor · September 1, 2025, 11:56am

Ok that is strange and I would like to investigate further so we can solve this. We directly route each model that you select. As I have no insight into AI providers capacity in regions I can only guess that it may be related.

Could you post a Request ID with privacy disabled so we can look into the details for both a case when its working well and case when its not. Cursor – Getting a Request ID

Is there any other difference apart of the time? (different project, machine, task type,…)
More details would help us narrow this down.

For myself I also work with GPT-5 and Sonnet 4 and apart of the model differences the performance has been the same on my personal

Topic		Replies	Views
Why does Cursor keep lose context and giving random answers? Help	10	978	April 27, 2025
People, Your Honest Opinion Discussions	23	2800	March 18, 2025
The models are Lazy and obtuse likely to increase cost to user Feedback	8	304	February 13, 2025
Insane Laziness In Agent Feedback	31	952	September 24, 2025
Loosing my mind Help	11	691	March 10, 2025

Getting frustrated w bad answers on Ultra Plan

Related topics