cheetah: stealth model from an undisclosed provider. Translation, we have added a model and enabled it without your permission, and we are sending your code to a company and we are not going to tell you who we have sent your code to. We won’t actually tell you about any of this, we will let you discover it, when inevitably you waste hours and wonder why ai is no longer working effectively.
Thanks cursor, you have put my code at risk, and wasted my time with this garbage. What’s more this is not the first time you have done this, you regularly turn on obscure models, and each time I finally realise that the reason why the ai responses have degraded is you have pulled this trick yet again, yet again wasting my time and sending my code to some unknown company.
That’s what seems to be happening, and it is actually a huge security issue - we don’t use certain model providers because of the risk to our code. Not to mention that I don’t want to be a beta tester for some random model when I have deadlines to meet
Bro, you really think your code is some crown jewel everyone’s dying to steal? Have you actually looked at Cursor’s privacy legacy mode? The whole point is your code doesn’t go anywhere. Cursor literally guarantees privacy, that’s like the main feature—so relax.
And seriously, you think there’s a team of ninjas out there just waiting to “steal” whatever you’re hacking together? Let’s all get a grip. If you don’t like it, just don’t use it. Not exactly rocket science.
This paranoia’s got more plot twists than a soap opera.
Not everyone using Cursor is simply using it to make software–it’s an advanced scientific utility: privacy preserving access to a WIDE range of models is a reason to use Cursor even regardless of Software Engineering.
Many usecases require “information lockdown”, including if the risk is your WIP Research getting leaked–which you were going to publish. Nobody has to actively try to steal it for ones work to be “stolen” by the AI–it’s a real thing.
The risk is the LLM may learn your formulas, present it to others as if it is the models own intuition, potentially ruining any chance at publishing depending on the time gap that occurs during the research phase.
This is a very serious concern I also worry about day to day, and have to be very careful to not allow leakage. Nothing to do with paranoia.
Fair point about research security - that’s a legit concern, especially when publishing timing matters.
But here’s the thing: if you’re worried about leakage, Cursor’s “privacy” is kinda half-baked. You’re still hitting cloud APIs unless you go full local. Their privacy features are nice for general use, but not really “information lockdown” level.
For actually sensitive research, wouldn’t you want:
Fully local models (Ollama/LM Studio)
Air-gapped systems
No cloud touching your data at all
I get the concern, just saying Cursor might not be the right tool for that threat level.
Personally, I don’t think there are local models capable of assisting with certain types of research—they can’t distinguish new science from noise. GPT-5, for instance, can verify complex theorems, analyze my codebase, ensure the experimental implementation aligns with the theoretical proof, and sift through data, all in one seamless workflow. I haven’t found any local model that pulls that off, period.
This is precisely what makes Cursor so valuable: integrating many models like Claude, GPT-5, and Grok approximates Solomonoff-like induction, yielding a broader intelligence that no single model can achieve alone.
Of all the codebases I work on, the smallest one has had 15 man years of work on it, and none are valued at less than $1m, so yes, they are valuable. It seems it’s you who doesn’t understand how Cursor works, your code snippets do get sent to LLM vendors - do you honestly think cursor runs all the ai models itself?
Just because you don’t have valuable code, it doesn’t mean no else does.
Obviously, you can’t fully trust LLM providers even if you enable privacy mode — you’re still sending valuable code and just hoping it won’t be stored, used for training, stolen for internal purposes, or accessed by foreign intelligence.
Even employees can steal code and other valuable data from their jobs, right? So when you “hire” cloud-based AI programmers, the responsibility ultimately falls on you.
exactly, which is exactly why I don’t appreciate finding that cursor has enabled and is using a model called: cheetah: stealth model from an undisclosed provider. I trust Anthropic, I certainly don’t trust deepseek, and I definitely don’t trust an unknown provider.
That’s Grok.
And thanks to you — and your valuable code — it’s going to get better.
Joking aside, that’s exactly how it works. There’s really no reason to blindly trust AI providers right now — it’s an insane arms race. Actual new knowledge is getting scarcer; what’s really improving is the quality of the training data selection.
In the end, you just have to decide who you’re willing to share your code with.
Yes, of course. GPT-5-high is in my opinion, unanimously better at math/working with totally novel concepts, and I almost exclusively use it when working with highly math-heavy code. That’s actually what I meant by “GPT-5”
It’s far more likely to consider all the implications of the new math, and consider those secondary implications as well as the direct task at hand. It can take “new” math: immediately go “Given this, clearly the way this(code) was previously being done was misguided” – without having to be told explicitly.
you understand it wrong – which models are enabled for model selector does not matter for auto, it selects models internally on server from the pull defined by Cursor devs. It can be whatever they deem to use
if privacy is such a concern you really shouldn’t use Cursor or any cloud-based tools altogether unless they guarantee you military-grade privacy (and even then it’s only a matter of you believing them)
I mean he does have a point. Of course we should not be paranoid of some AI provider training on our codebase because its probably not super valuable to train on by itself. However, if they train on hundreds of thousands of people’s “private” codebases that they did not want their code trained on, then that seems unethical and exploitative, because at that point they are getting clear competitive value from that action. Because their is so much potential for that value, I assume some of these AI providers are training on our code despite the privacy settings. In the end of the day, it would be hard to prove and they are banking on that obfuscation and can just say they trained it on another model.
tl;dr They are training on your data if it goes to the cloud.