It’s not clear to me what local mode / non local means when it comes code being processed. With local mode off it says a “small amount of code” will be stored. With local mode on it says none will be stored. With it off, is the data that is stored associated with our accounts? Is it deletable on request? Does it get deleted automatically after a degree of time? How much is a “small amount of code”; it appear when running in codebase mode that it scans the entire codebase; could be hundreds of files? Thousands of files? Millions of lines of code? What is “small”? 5 lines? 1000 lines? 100000? Relative? And how are those lines chosen? Is there any way to select certain files as inaccessible? I’d be interested in supporting your “analytics”, but your security process around processed data is pretty opaque. And I think the ide should default to local mode rather than require opt in. And what about the behavior of GPT with the different modes?
Answers below. Note no code is stored if you turn on local mode.
Yes
No
Essentially whatever makes it into the prompt for the AI; think around 1k lines per request.
Soon! Thinking a .cursorignore
file would be good, but let me know if you have opinions here.
Noted, at the very least, we should probably make this a modal when you onboard.
Shouldn’t change.
Who do we contact if we want our data deleted? Any particular email?
I think the .ignore file sounds good. Perhaps there should maybe be a possibility .include file as well, and that if it exists it takes priority over a .ignore. And only files in the .include file are scanned in the codebase mode.
Is the data sent to GPT the prompt plus whatever code is displayed as the context? I notice when I do a global, it provides me snippets of code from what it searches. Is that what is sent to GPT ultimately? Is code other than that stored on your servers if local is disabled or is that what is stored on your servers if local is disabled?
hi@cursor.so works well
I believe the snippets that are shown to you ask context are stored for analytics right now. Wouldn’t be more than that but might be less code.
And what is shown as context is sent to GPT? I imagine not all of codebase is sent to GPT? GPT doesn’t as far as I know currently use the data sent to its api for training unless explicitly directed to by a user, right? Do you ever ask / allow GPT use the code it receives via its api (the context and the prompt I assume) for training?
Thx for being so responsive btw.