I am using Cursor for python and python notebooks (aka jupyter/ipython). I have fairly large notebooks that have a lot of output (for example, the file is 2.1MB, so that’s 2 million characters)
When I use agent/chat on the ipynb file, the entire file appears to be attached, including all this output. At worst, this causes the entire request to fail. Often this happens after a few conversation turns have overflowed the context. In which case I have to clear all outputs and try again. But even if it the request succeeds, it’s a big waste of context.
I would suggest that when the file is an ipynb, by default cursor should preprocess the file to remove the outputs. The user could choose to override this and include the outputs if they want. Or a specific cell or selection. I realize that dropping the outputs would make it difficult to apply suggested diffs from the agent and obviously the agent’s diffs shouldn’t clear all outputs.
I would note there are several other problems with ipynb: it’s difficult to apply changes from agent, it’s difficult to debug errors compared to straight python, etc but this context size issue is the worst.
How to reproduce:
run a very verbose notebook. Something like print(f"hello {n}") for n in range(1,10000000) in a few cells?
I’ve observed similar problems for images displayed in a notebook. Cursor completely hangs and fails to save a notebook containing 10 images, no problems in VS Code. In some cases a remote restart was required to recover. Is the image data (part of the notebook cell output) being added to the context somehow?
To reproduce make a notebook with one cell containing:
import matplotlib.pyplot as plt
import numpy as np
for i in range(10):
plt.imshow(np.random.rand(1024, 768))
plt.show()
Run the cell, then attempt to save the notebook (or do anything else).
You may have to force quit Cursor and restart your remote to recover.
I am also having this issue, this makes it impossible to work with Jupyter Notebooks past a certain point. As the file is too big, the Cursor agent uses up a bunch of tool calls to try to extract the cell I’m looking for and this also results in the context being filled with useless data that confuses the LLM. It also struggles editing the notebook and sometimes adds cells in the wrong place or forgets to remove old cells when this happens.
I have also experienced Cursor randomly hanging while working in Jupyter notebooks, although I can’t confirm if it’s related to images. What I can confirm is that having images in the notebook (such as generated plots that are saved as base64) also make it difficult for the agent to work with it as the notebook gets too long. At least for this issue, the solution seems simple, strip the base64 image data from the notebook before giving it to the LLM, and maybe add an option to upload the images separately if the LLM is multimodal.
For long textual outputs, I would suggest getting rid of repeated lines or maybe using another LLM to extract the important parts of it before providing it to the agent. In any case, I would appreciate a fix for this as currently it’s really quite infuriating to work with notebooks in Cursor.