I encountered the same issue and resolved it by splitting the file into smaller segments, which allowed normal processing. It appears that the cursor may have a memory constraint per request, as larger files could trigger service timeouts due to excessive resource demands.
My 2 cents theory is that it’s probably not the API requests to LLM that are crashing but maybe the diff algorithm which, for some reason, can’t bear more than ~1200 lines of code changes without becoming very slow.
Maybe a VSCode hardcoded RAM usage limit for subservices or something? Looks like it at least.
Models like Gemini are supposed to have 1 million token context, this is theoretically like hundreds of thousands of lines of code, so I don’t think the bottleneck is the LLM itself, but rather the software/VSCode infra that processes all this input/output, code changes, etc.