Data structure awareness for user specified variable

neuralnet · October 20, 2024, 2:46pm

I want to be able to select a variable in the editor for Cursor to detect and maintain an outline of it’s underlying data structure. Anytime I bring the file with that variable into chat, Cursor will then feed said variable’s data structure to the LLM along with my query, etc.

petros · October 21, 2024, 5:38pm

Thank you for the feature request @neuralnet. Could you expand a bit on the specific problem you are trying to solve with your proposed feature request? I would love to understand that more.

Cheers,
Petros

neuralnet · October 21, 2024, 9:12pm

Occasionally, I need to tell Cursor what the data structure for a variable is, and it becomes inconvenient when I have to do it repeatedly. For example, I have this list of dicts loaded in from a third-party library that I’m working with:

>>> x[0]
{'start': '0.08', 'dur': '4.12', 'text': 'short string'}
>>> x[1]
{'start': '4.12', 'dur': '3.18', 'text': 'another short string'}

Cursor can’t see or infer the datastructure for x and hence it can’t work with x effectively without me giving it context about the underlying data structure. Having to manually add that context into my prompts has been one of my bigger inconveniences so far. Sometimes, I don’t even realise that I need to provide context about a specific variable’s data structure until after I’ve read the response to my prompt, which is also not ideal. Given that many variables’ data structures can be inferred from the code, I figure the user ought to select variables’ whose data structures they want to put under a spotlight. I wonder if the Cursor developers could manage to run commands in the background to figure out the variables’ data structures under the hood.

However, since making this thread, I’ve actually started to simply append code comments to my Python files that give the LLMs the relevant context:

# DATA STRUCTURE (do not remove)
# >>> x[0]
# {'start': '0.08', 'dur': '4.12', 'text': 'short string'}

As it turns out, I find appending comment examples such as the one above is a pretty handy way to achieve what I was aiming for with this feature request, and I’m going to update my “Rules for AI” accordingly . Having said that, if Cursor can do some smart under-the-hood awareness for data structures, especially the opaque ones, I think that would be a pretty cool and useful feature.

petros · October 22, 2024, 7:54am

Thank you so much for the additional context . I am glad you have figured out a way to use comments to improve the situation.

Indeed, AI cannot always infer the shape of data. Especially when that shape is not defined in some structure.

I see comments are being used more and more in that fashion. To assist AI so that its responses are more helpful and correct.

Another way to approach this when you know the shape is not temporary or dynamically changes on every run of your program, is to use project based rules.

You can do that by introducing a .cursorrules file in the root of your project. You can add project related context and rules in that file and features such as Cursor Chat will include it.

Granted, that’s at the project level, but you could possibly mention files and variables and experiment with it.

Comments seem a bit better in that they are close to the actual variable. But if the rules file works for you, it might be a better choice if you don’t want to pollute your code with extra comments.

I hope that helps a bit.

Cheers,
Petros

neuralnet · October 22, 2024, 12:55pm

Thanks @petros. After playing with these ideas a bit more, I’ve found that closer is definitely better. I tried placing the variable’s data structure comments at (i) the end of the file and (ii) the end of the relevant function, much closer to where the variable was first defined - I’ve found that placing it right below the function (option (ii)) gets better responses from the LLMs, and I can live with a little code comment pollution if it means the LLMs more effective.

Here is an example of the LLM performing better in scenario (ii)
For context, I added the following rule to my “Rules for AI”

Data structure comment blocks (starting with # DATA STRUCTURE) outline key variables and their data structures. When your code modifies the structure of any variable that is already described by such comment blocks, update the data structure comment block accordingly. Use information in these comment blocks to guide code writing.

So, as you can see, I’m asking the LLMs to update the data structure comments as required so that I don’t have to. With that rule in place, I prompted GPT-4o to change dur and start to type float at the place where x was first defined. In scenario (i), GPT-4o updated the code but not the data structure code comments. In scenario (ii), GPT-4o was able to completely follow the rule, both updating the code and data structure code comments properly.

I think the .cursorrules option is the tidiest but also places the variable definitions furthest away, option (ii) is the least tidy but probably most effective for the LLM, and (i) is a bit of middle ground. I decided to run with option (ii) for now. Thanks for your support. It has been very helpful.

petros · October 22, 2024, 1:49pm

I love the feedback and the details you have shared. This can help everyone in the forum if they are looking to solve a similar problem. And it also helps the team think about the problem in the chance they want to tackle it in the future.

Thank you again,
Petros

Topic		Replies	Views
Cursor Feature Deep Dive Discussion	3	983	September 3, 2024
Comment-Based Code Protection in Cursor.AI Feature Requests	0	100	November 3, 2024
Please give users the ability to see the context windows in every every inference call Feature Requests	4	82	January 21, 2025
Quick Question About Cursor’s LLM Integration Discussion	2	324	November 8, 2024
Cursor feature summary for colleagues (with annotated screenshots) Discussion	2	946	July 23, 2024

Data structure awareness for user specified variable

Related topics