Hi everyone,
I’ve been using the AI coding tool Cursor recently, but I ran into an issue. When I ask Cursor to modify my mobile code, it sometimes messes up and also changes my PC code by mistake. Is there a way—maybe a certain mindset or using an MCP server(better)—to help Cursor reduce these kinds of wrong edits? How can I prevent this from happening?
As a total. Noob, i ran into this kind of thing for awhile with claude Sonnet 3.5, where it would jump into files I didn’t want it to mess with. I solved it by telling the AI exactly what I wanted it to do and what I wanted wanted it to avoid, and questioning whether it understood, agreed, and having it repeat the instructions as it understood them. Problem solved.
No, it isn’t. This is completely factually incorrect and actually many advanced users advise against this because it does not tangibly have any benefit. You are asking a calculator to calculate what you might ask it to calculate better something it does not see. Do not do this for the love of God.
Then that is not attibutable nor can be attributable to the model. The model doesn’t even see its own system prompt - it is embedded on purpose. Yet - it will feed you one. Generation bias is strong, you can do what you want, that only goes so far. Word of caution but it seems to be working for you, so good on you. Maybe I fail, I feel it more lucky overlap in those cases than the model embodying itself, because it is not trained on content explaining itself, just flavors of itself - nor generates inferences with any mechanism for that level of environmental awareness. It is the same reason it is dangerous to call model outputs hallucinations - it has no grounding.
Also I would see informing others of behaviors I’ve tried numerous times and in numerous mediums prove to be past most basic extents - dangerous to assume when trying to forum useful outputs for your goal, as useful. I told people why it was probaby pseudo scientific behavior, and it has ramifications at the embedding level, which already doesn’t matter because we don’t have any idea what our query looks like in relation to input tensor to the Provider. All in all, you may have your opinions, but I research in this field. There are parts to what you say that may yield you utility, but confirmation bias still exists and I have seen your anecdote play out horribly and to the tune of my money…just saying. Pseudo science leads us to waste money looking at superstition. Utility first mindset is useful, but it has a boundary, and a very hard one T1000. That being said, you say it works for you, and I’d be curious to hear your techniques specifically about the issue, not your adjacent abilities in morphing user query in some uninterpretable way to improve user experience because we’re talking principles and utility in concert - unless im missing something obvious. We die in your dependency hell when the model stops exhibiting that convenient emergent behavior. That overlap is everywhere. Its outputs always convenient and never with intent, intent only being possible through carrying context and outside implementation making the model aware of itself, which it can only mathematically simulate as it doesn’t have an environment past semantic language.
You make a very good point. Not all approaches are attributable to the model at all. Some are just different ways of us using tools with best practices to improve tools.
Asking AI in-context of a task to improve a prompt may lead to false assumptions or just stochastic responses that are more or less pleasing sounding while not changing anything on the result. I’m not disagreeing on the dangers of this.
In the users case AI would not have responded likely with “You forgot to tell me to change only mobile version” though it is the most applicable statement.
Models have drastically changed over last year, including their capabilities. So have also prompting techniques advanced but also become simpler at same time. Basically newer models do not need as long prompts to understand the users intent, process and goal. Where e.g. Claude 3.5 Sonnet or earlier OpenAI models required more detailed tasks, latest Sonnet models have been trained on more such processed also in coding which leads to higher quality handling by AI. This is on top of further advances from IT being present in training data.
Detailed prompts that worked for Claude 3.5 Sonnet as they were optimized for its abilities do not work anymore on Claude 4 Sonnet (also didnt work in 3.7).
What you say about confirmation bias is well visible when people ask e.g. Claude 4 Sonnet Thinking which model it is and then when it says 3.5 after thinking about the request, they complain here that they were misled, not understanding that they were misled by just trusting the AI response.
It helps to understand how AI models work and how this impacts prompts.
Techniques I use are those widely known and well working best practices or simply things that have proven to be working. However there is a big difference in prompts optimized for a certain task when using AI in projects and prompts we give to AI in Cursor.
As Cursor has its own internal prompts to tackle tasks our prompts may not need to be so detailed or provide as much guidance.
Terms like hallucinations are human characteristics attributed to an AI and not scientific representation of what is happening
YES! OK so we agree then on most points. I will try some of your suggestions - but that’s the touchpoint I think I struggle with - at some level as users we require some “grounding” right? Like - we don’t know we can’t see the system prompt - why? I wouldn’t ask the model for something it has embedded in it. The word embedded is becoming way more physically relevant than I cared to see, and so fast too, because the implication is unseen directive that serves…someone. Not tin foil hat, just in the vein of our convo, does that line of thinking and the blind spots it creates concern you? Often times, yes I find some good prompting, but I don’t know when the rug gets pulled - let alone to what extent (where it is pulled), or what the system looks like (the living room). Crappy analogy, but you put it positively very well. I do resonate with - just prompt it - I just fear that piece of the pie closing down in high level ways we might need at this level of dev.
EDIT: Thank you for your comment tempering the word hallucinations. It drives me nuts when you look at the math and trace it back. It’s just wrong or right like a calculator - but that’s it right? the human metaphor can be used in many ways where currently we see a ? To lie, requires context, otherwise you are just incorrect. And context and expectation are built in interaction experentially - models can’t do that. They mimic it so well we have these conversations on forums legitimizing our only avenue to tangibly grasping what happens inside of these things. But most my issue with current implementation, and why I fuss, is because this type of feedback, I’m starting to realize - Cursor may be taking to heart. When it’s none of their business until they shell some revenue out to train models and host with the big boys. Until then, the ? is unacceptable. I know AI, transformers, CNNs, ANNs, RNNs, whatever. It’s all the same theory underlying - and I am all the same useless because the ? prohibits me from using what I know about embeddings, models, architectures, and what I’d do differently if I had 30 grand, to save myself mere API cost. It’s why I remain sharp on this. At some point, the superstition and empirical illusions of an iterative and checkpointed input output machine become incorrect and costly.
When you look at the mechanics, either the model is told retroactively its name and simulated embodiment, or not. But it cannot physically or mathematically be trained in that contrived way, it would destroy the model’s ability to perform token unmasking. The reality we’re describing doesn’t exist. It’s not possible currently unless we’re being massively lied to or whitepapers restricted - it disobeys information theory.
Sure most people not familiar how such systems work would not assume that there is one or more system prompts at different levels (AI provider, Cursor) no matter if passed as text to API, used as separate orchestrator or embedded as in training vs embedded as in call to GPU.
AI is a new form of technology interface and with recent AI advances it is naturally concerning how things change.
For example there is AI’s more frequent refusal to perform tasks, be overly helpful or make things sound more convincing than they should be.
I think we are still early in AI advancements, so interactions will need to change with time, hopefully not just on human side
Whats interesting about AI is that is not entirely like an calculator, while it is using math for its ‘calculations’ the outcome is partially randomized during training and partially on inference. The word randomized would match in a literal sense as well as a figurative e.g. passing of tokens though network layers both on training and inference.
AI providers have started to counter phrases like hallucination or lie stronger than two generations ago or even just one.
Note, I’m talking about personal (professional) experience and knowledge, not necessarily how Cursor actually works or what processes actually exist as I’m not privy to those.
This is my point. With all due respect, transformers technology is 8 going on 9 years old. This is new to people who are being fed the narrative it is new. The math is simple. And I’d never personally ask a trained embeddings model about itself when it is mere inference in the end, where the content:
-couldnt hope to be mathematically impactful without destroying learned state dict / distorting goal orientation in non human interpretable ways
-literally cannot reproduce anything but an INFERRED version, confidently.
It is mathematically incorrect and impossible to say it is possible that I can train a billions or tens of billions large gated and non linear transformations and convolutions or reorganizations to exhibit specific behaviors in a way that doesn’t inflect its entire ability to perform its token unmasking task. Everything we do with LLMs is a scale benefit, we are trying to reduce that with empiricism, and gasp, we don’t find any in our very convoluted guess and check brute force intelligence model. These stories, are just that. Not real. There are numbers and many hard workers behind this magic, that can explain this.
whatever the landscape is, and i know I can define it, needs defined by the people who offer this new tech to people who think it is genuinely new. the concepts underlying it are math, that date back to arabian BC era. it’s in and out. anything else is not correct. we can’t even evaluate these black boxes with respect to predicted interaction effects with us in that model because there is so much not told to us, i actually feel stupid even exercising this convo. it just feels very “how cute” to me. the portion of the system uninterpretable is easy to abuse and also blame and also can act non culpable in insane ways. who do we turn to to demand value out of our money? do we simply quit? that’s a bit “no, wtf” in my opinion.
These things are not new. Arxiv exists. The papers are hard to understand, because this is high level abstraction outside our human bounds. We need to start treating it with proper respect - we cannot make empirical claims that may work sometimes and not others when the hinge point is biased in literally all forms for contrived output on purpose. These are not designed to be interpretable at all. The trained models Cursor stewards, believe it or not, is the most constant part of the entire model. Do you get such wildly different outputs from a browser version of thte same model? No certainly, you get worse + clever terminal/invisible terminal regex/auto coding which works, or doesn’t depending on Cursor’s effort and investment level. When most of my variable costs cannot be attributed to what I know to be a fixed model state dict, which I could import using huggingface, pytorch, or anything else given the capital to do so, you find these conversations boring because in the end its size of machine and we’re all milking a temporary circus. Just my 2c. I’ve bled the math. What I’m seeing is not AI confusion nor is any of this conversation practical because it falls through when our high level assumption becomes invalid due to some high level empirical conclusion we mistakenly conned ourselves into thinking applies through all the discontinuity of logic.