tldr; Can we get in legal trouble using code generated by Cursor?
Details: I was mainly using Cursor for hobby / personal / non commercial projects up till now. However, if I use Cursor generated code for a project which I end up selling via acquire.com and the buyer does a code plagiarism check, what are the odds that Cursor had generated code that was under a non permissive license (eg. GPL)? Because in that case my project would be in violation of copyrighted code infringement.
This is an important question. I did not find any information in the official docs. Can someone from Cursor official team clarify please? Thanks. @ericzakariasson@deanrie@truell20
p.s. I understand that code is generated by underlying models like Claude etc BUT but I also use another AI powered IDE which does not train on GPL code. I like Cursor and want to be assured that we are in the safe zone when using Cursor.
I’ll give you the benefit of doubt for saying this. Let someone experienced answer this. You could read my query properly and research it. May help you someday.
I think this is a question you should ask the companies behind the LLM’s you’re using in Cursor, since it’s about their training data. Cursor just gives you what the model gives them. I highly doubt cursor has some kind of middleware between the user and the LLM moderating their responses.
Mostly, unless you clone a website and use the exact same design and sell the same product or service, there won’t be any copyright issues. In your case, the person who bought your SaaS is someone who doesn’t want to spend time creating stuff he’s just interested in your product/service. And if you think AI just copy pastes the entire codebase, then you have a very low level of understanding of LLMs. The risk of reproducing same code word for word is low because ai is generating based on patterns, not pulling exact code.
You’re right to be thinking about this. The short answer is that, as you mentioned, Cursor itself doesn’t generate code - we integrate with various AI models (Claude, GPT-4 etc) that do the actual code generation. Each model has its own approach to training and code generation.
While the underlying LLMs that power Cursor may have been trained on both permissive and non-pemissive licencesed code, the way they train means that if the model were to return a block of code, it would never be a 1-for-1 copy of any single repository, but a culmination of what the most likely answer would be to your question.
While the risk of exact code copying is very low since these models generate based on patterns rather than copying, if you’re developing commercial software it’s always smart to:
Review generated code carefully
Keep records of your development process
Consider getting a legal opinion if you’re particularly concerned
You can read more about how we handle code generation in our docs: Cursor – Overview
Thanks for clarifying Dan. I asked because my startups serve enterprise customers in finance and legal domains. These customers require strict security compliance from their vendors. I’ve been using GitHub Copilot Business plan and it offers indemnity protection against IP infringement claims arising from code generated by copilot. I liked Cursor’s workflow but will keep it for non enterprise projects for now.