Cursor indexing can freeze/crash large ML workspaces and may correlate with tracked files being deleted

Where does the bug appear (feature/product)?

Cursor IDE

Describe the Bug

Cursor repeatedly becomes unusably slow or crashes when indexing a large ML/video workspace.

This has happened many times:

  • Cursor slow/crash during indexing: 10+ times
  • Tracked files unexpectedly appearing as deleted in git status: 3+ times

During the incident, I monitored top continuously and observed total Cursor-related CPU usage staying above 350% for a while. The main process was node … --type=extensionHost, and several rg processes were running at the same time.

The rg commands looked like:

rg --files --hidden --case-sensitive --no-require-git --no-ignore-parent --follow --no-config --no-ignore-global …

The workspace contains many ML/video artifacts: model checkpoints, generated images/videos/audio, runtime caches, preprocessing outputs, training runs, package caches, and large binaries such as .pth, .onnx, .trt, .mp4, .jpg, .npz, .pkl.

The most serious issue is that tracked files unexpectedly appeared as deleted in git status. In one case, this broke a running dev server because source files disappeared from the working tree:

ModuleNotFoundError: No module named ‘app.api.rest’

I recovered the files with:

git ls-files -d -z | xargs -0 git restore –

I understand rg itself should be read-only, so I am not claiming ripgrep directly deleted files. But the repeated sequence is:

  1. Cursor indexing starts
  2. extensionHost / rg becomes very heavy
  3. Cursor freezes or crashes
  4. tracked files sometimes appear deleted

This seems unsafe for large workspaces.

Steps to Reproduce

  1. Open a large ML/video workspace in Cursor.
  2. The workspace contains many generated artifacts and large binary files, such as:
    • model checkpoints
    • training runs
    • runtime caches
    • preprocessing outputs
    • generated images/videos/audio
    • package caches
    • *.pth, *.onnx, *.trt, *.mp4, *.jpg, *.npz, *.pkl
  3. Let Cursor indexing run.
  4. Monitor CPU usage with top.
  5. Observe node … --type=extensionHost and multiple rg processes consuming high CPU.
  6. In some cases Cursor becomes unusable or crashes.
  7. In several incidents, git status showed many tracked files as deleted, even though I did not delete them.

Expected Behavior

Cursor indexing/search should never modify, delete, or cause tracked source files to disappear.

If a workspace is too large or risky to index safely, Cursor should detect that and warn or stop indexing.

Cursor should avoid aggressive scans such as --hidden + --follow + --no-ignore-parent across large ML artifact directories unless the user explicitly allows it.

The IDE should remain usable, or at least fail safely without affecting the working tree.

Operating System

MacOS

Version Information

Version: 3.2.11
VSCode Version: 1.105.1
Commit: e9ee1339915a927dfb2df4a836dd9c8337e17cc0
Date: 2026-04-24T14:36:47.933Z
Layout: editor
Build Type: Stable
Release Track: Default
Electron: 39.8.1
Chromium: 142.0.7444.265
Node.js: 22.22.1
V8: 14.2.231.22-electron.0
OS: Darwin arm64 25.3.0

Connection type: Remote SSH workspace
Remote OS: Linux 5.19.0-32-generic
Cursor server: running under ~/.cursor-server on the remote machine

For AI issues: which model did you use?

N/A - this appears to be an IDE indexing / extensionHost / file watcher issue, not a model-specific issue.

Additional Information

Regression note:
This issue started after upgrading to Cursor 3.x. The same workspace did not show this behavior before Cursor 3.

I do not have a screenshot or screen recording from the incident.

However, I monitored top continuously during the incident and observed total Cursor-related CPU usage staying above 300% for a while. The main process was node … --type=extensionHost, with multiple rg processes running at the same time.

After the incident, git status showed many tracked files as deleted, and a dev server failed with ModuleNotFoundError until I restored the files with git restore.

Related reports:

Workaround attempted:
I added .cursorignore rules to exclude heavy/generated artifacts while keeping source code and docs indexable. The excluded categories include model checkpoints, runtime caches, preprocessing outputs, training runs, generated media, package caches, and large binary files.

This may reduce the load, but once extensionHost enters the high-CPU state, a Cursor window reload seems necessary.

Suggested safeguards:

  • Detect large/high-risk workspaces before indexing.
  • Warn before indexing directories with many generated artifacts or huge binaries.
  • Avoid following symlinks by default in indexing scans.
  • Ensure indexing/file watching cannot affect the working tree.

Does this stop you from using Cursor

Yes - Cursor is unusable

Hi @zKojira,

Thank you for the detailed report — the process monitoring, rg command flags, and recovery steps are very helpful.

You’re hitting two separate known issues that tend to co-occur on large workspaces accessed via Remote SSH:

1. High CPU from indexing/scanning

The rg processes you’re seeing scan your workspace for rules files and codebase indexing. On workspaces with many large binary files and artifacts, this can consume excessive CPU. A few things that should help beyond what you’ve already done with .cursorignore:

  • Disable symlink following for search: Go to File > Preferences > Settings, search for search.followSymlinks, and set it to false. The rg flags you captured (--follow) indicate symlinks are being traversed, which can dramatically increase scan scope on ML workspaces.

  • Create a .ignore file in your workspace root (separate from .cursorignore). This is natively respected by rg and will apply to the rules discovery scans that .cursorignore does not control. Add patterns for your heavy directories:

# ML artifacts
**/*.pth
**/*.onnx
**/*.trt
**/*.npz
**/*.pkl
**/checkpoints/
**/training_runs/
**/preprocessed/
  • Exclude directories from the file watcher: In your workspace settings (.vscode/settings.json), add files.watcherExclude for large artifact directories:
{
"files.watcherExclude": {
"**/checkpoints/**": true,
"**/training_runs/**": true,
"**/generated/**": true
}
}
  • Disable rules import from subfolders: Go to Cursor Settings > Rules, Skills, Subagents and disable “Include .cursor/rules from subfolders” if you don’t need it. This eliminates one of the two rg processes that scan your entire workspace.

2. Files appearing as deleted after crashes

This is a separate known issue where extension host crashes during Remote SSH sessions can leave the workspace in an inconsistent state. The indexing itself is read-only and cannot delete files — the file disappearance is a side effect of the crash/disconnect, not the scanning.

Your recovery approach (git ls-files -d -z | xargs -0 git restore --) is correct. As a precaution, consider committing more frequently (even WIP commits) and backing up any untracked files that are important, since untracked files can’t be recovered via git.

Both issues are being tracked by our engineering team. Your report — especially the detailed rg command flags and the regression note about Cursor 3.x — helps with prioritization.

Let me know if the .ignore file and the symlink setting help reduce the CPU load.

@mohitjain

Thank you for the detailed response. The distinction between the high-CPU indexing/scanning issue and the Remote SSH extension host crash/inconsistent workspace state makes sense.

I want to add one important clarification: before adding .cursorignore, I disabled Codebase Indexing itself in Cursor settings. After disabling Codebase Indexing, both symptoms stopped happening.

Before disabling Codebase Indexing:

  • Cursor repeatedly became very slow or crashed during indexing/scanning.
  • I observed extensionHost and multiple rg processes consuming very high CPU.
  • Tracked files appeared as deleted in git status several times.
  • In one case, the running dev server broke because source files disappeared from the working tree.

After disabling Codebase Indexing in Cursor settings:

  • Cursor no longer becomes unusably slow in this workspace.
  • I have not seen the tracked-file-deleted issue happen again.
  • The workspace has remained stable.

So I agree that rg itself should be read-only, and I am not claiming that rg directly deletes files. However, based on the before/after behavior, there appears to be a strong causal relationship between Cursor’s Codebase Indexing/scanning workload and the file disappearance/inconsistent workspace state during Remote SSH crashes.

I later added .cursorignore as an additional mitigation, excluding large ML/video artifacts, generated files, checkpoints, training runs, caches, media files, and model binaries. However, this was added after Codebase Indexing had already been disabled, so .cursorignore is not what made the symptoms stop.

I will also consider the additional mitigations you suggested:

  • Set search.followSymlinks to false.
  • Add a root .ignore file so rg-based scans outside .cursorignore also avoid heavy artifact directories.
  • Add files.watcherExclude entries for large generated directories.
  • Disable “Include .cursor/rules from subfolders” if it is not needed.

The main issue for me is that Codebase Indexing currently seems to be controlled by a global setting. Disabling it avoids the problem in this large Remote SSH ML/video workspace, but it also disables indexing for other projects where Codebase Indexing is useful and safe. That makes the workaround very inconvenient.

Is there an official way to keep Codebase Indexing disabled only for a specific Remote SSH workspace, while leaving it enabled for other projects?

As a product suggestion, Cursor should detect large ML/video workspaces or unusually risky scan scopes and warn before running aggressive scans such as --hidden, --follow, and --no-ignore-parent, or stop the scan by default. When tracked files can appear as deleted after crashes, this is no longer just a performance issue; it becomes a data-loss risk. Even if indexing itself is read-only, if the indexing/scanning workload can trigger Remote SSH crashes or workspace inconsistency, Cursor should fail safely.

Thanks again for tracking this.

Thanks for the clarification.
About per-workspace control: the main “Index New Folders” toggle in Cursor Settings is global, but you can disable indexing for a specific workspace. Open the Codebase Indexing panel in Cursor Settings while you have your ML workspace open, then click the delete (trash) button next to the index. This removes the index for that workspace and marks it as “do not index.” Other workspaces you open will still be indexed normally.

After deleting the index, the ML workspace will show as “not indexed” and won’t re-index automatically. If you ever want indexing back for that workspace, you can re-enable it from the same panel.

That said, I’d still recommend keeping the .cursorignore, .ignore, and files.watcherExclude configurations you’ve set up. Even with indexing disabled, they reduce load from other scanning processes (rules discovery, file watching) that run independently.

Your suggestion about large workspace detection is noted alongside the existing tracking.

Thank you for the explanation about per-workspace indexing control.

My understanding is that the global “Index New Folders” setting is not the only option: if I open this Remote SSH ML workspace and delete its index from the Codebase Indexing panel, this workspace can remain in a “not indexed / do not index” state while other workspaces can still use indexing.

I also understand that .cursorignore, a root .ignore, files.watcherExclude, rules discovery, file watching, and other scan paths may still operate independently of Codebase Indexing, so keeping those mitigations can still be useful even when Codebase Indexing is disabled.

I want to add an important update: a similar file-deletion issue has happened again.

To clarify the premise: the issue I have been reporting is not that large ML/video artifacts were deleted. From the beginning, the problem has consistently been that tracked source files and source docs disappeared from the working tree and appeared as deleted in git. Large ML/video artifacts were mentioned as a possible cause of heavy Cursor scanning load, but they were not the files being deleted.

After the previous incident, I temporarily set .cursorignore to *, effectively hiding the entire repo from Cursor. During the period when .cursorignore was set to *, I did not observe the same deletion issue.

Later, following the advice in this thread, I changed .cursorignore so that only large ML artifacts and generated outputs were excluded, while normal source files and docs were visible to Cursor again.

However, Cursor Codebase Indexing itself is still fully disabled. Therefore, I cannot say that Codebase Indexing directly processed and deleted these files. More accurately, I suspect that something around Cursor’s Remote SSH workspace reload, agent tooling, rules discovery / scanning, file watcher, internal cleanup, or file operation path may be causing source/docs files in the working tree to become deleted.

What I confirmed this time:

  • Some tracked source files / docs appeared as deleted in git.
  • Several untracked helper scripts and docs that had been added during recent work also disappeared.
  • No large ML/video artifacts were deleted this time either.
  • I did not run any explicit delete operation or any git operation that would delete those files.
  • The agent operation logs also do not show any delete operation against the affected files.
  • Tracked files could be restored from HEAD.
  • Uncommitted changes and untracked files were reconstructed as much as possible from Cursor agent transcripts / tool outputs.
  • I preserved the immediate post-incident git status, diff, deleted-file list, and extracted recovery patch candidates locally.

The important new observation is that the workspace was stable while .cursorignore was set to *, but after normal source/docs were made visible to Cursor again, source/docs files became deleted again even though Codebase Indexing itself remained disabled.

I still cannot determine the exact root cause. But based on the previous incidents and this recurrence, I suspect this may involve not only Codebase Indexing itself, but also Cursor’s Remote SSH workspace reload / agent tooling / rules discovery / scanning / file watcher / cleanup / file operation path.

The fact that source files and docs in the working tree can suddenly become deleted is itself unacceptable for a development tool. Even if tracked files can be restored from git, unexpected deletion still requires recovery work, and if only some files disappear, it can take a long time to identify the cause of build errors, runtime errors, or subtle behavior changes. The situation is even worse for uncommitted or untracked files, but even for committed tracked files, they must not disappear from the working tree without an explicit user action.

If this kind of issue can continue to happen, it becomes very difficult to keep using Cursor for this work. At a minimum, a development tool must be able to guarantee that it will not destroy the user’s working tree.