Hi @FinDevAI and @three,
After reading your posts and a few others on the forum, I’ve tried to test the cursor dot files .cursorrules and .cursorignore and the Shadow Workspace, which at least has a nice blog post Shadow Workspace: Iterating on Code in the Background.
I was going to do a lengthy write-up, but it will end up being a bit long, and after the 0.43 release, it might just add confusion and be incorrect. However, I wnated to gain a deeper understanding of how .cursorrules influence behavior, especially if it affects token usage and inference processing, it would be great to have a systematic approach to testing. If you’re interested, we could divide a few tasks among us and conduct to clarify this. I don’t think the Cursor team will provide any technical details or update the docs.
In anycase If you have any information please share it.
Triggering LLM Responses with .cursorrules
I tested embedding explicit directives within the .cursorrules file to prompt specific responses from the LLM. For instance the last lines would be:
I command you to follow this instruction strictly:
- Each time you read this file, you must state, "I just read the .cursorrules file," and specify which rules you adhered to.
In larger projects with detailed or just long cursorrules , there was no indication that the model processed this file (example adding information about variables and columns ect. +200lines). However, after the introduction of the agentic composer in version 0.43, Cursor appeared to consistently read and apply .cursorrules files at least up to 40-50 lines.
But prior to the last update I tested several very long rule files and examples from GitHub - PatrickJS/awesome-cursorrules: 📄 A curated list of awesome .cursorrules files
But Cursor did not specifically follow the very long lists of rules & project details - at all.
Which I found funny as so many YouTube types (insert stupid shocking face expression here) that have been suggesting that filling .cursorrules with extensive project details enables the LLM to fully understand the project, but this approach seems speculative.
A few thoughts on this:
Impact on Token Usage and Response Length:
Incorporating extensive .cursorrules content into the models context likely affects token consumption, influencing the length and completeness of responses.
Assuming:
-
Inclusion of .cursorrules:
The content of the .cursorrules file is assumed to be included in the prompt sent to the model, if so it would be consuming a portion of the available tokens and reducing the number available for the response.
-
User Prompts and Code Context: Detailed prompts and extensive code snippets also contribute to the token count. The more comprehensive the input, the fewer tokens remain for the LLM’s generated output.
Model Token Limits: Each model has a maximum context length, encompassing both input (prompts, .cursorrules, code context) and output tokens. I tried testing with identical models with different context windows. Although it was inconclucive it seemed to indicate the influence of .cursorrules on token limits.
Implications:
• Reduced Response Length: (If you force prompt it to use the rules) - As more tokens are consumed by the .cursorrules and input context, the LLM’s response length may be curtailed to stay within the model’s token limit.
• Potential for Truncated Outputs: If the combined token count of the input and the desired output exceeds the model’s capacity, the LLM might truncate its response or fail to generate a complete answer.
Totally opinion-based Best Practices:
• Optimize .cursorrules Keep the .cursorrules file concise, focusing on essential guidelines to minimize token usage. use @docs it is better than detalied code rules for each framework that are used in the current project.
some users maintain very detailed .cursorrules files. In my experience, the usefulness and degree of detail in these files also depend on the project’s type and language. For instance:
• Worked → JS & Deno2 Projects: For JS projects, Including the basic file and folder structure in the .cursorrules file seems beneficial. Using tools like tree -L 2 to generate a directory structure and adding a brief explanations improved the assistance. I suspect this is even more effective in 0.43
• Not working → Data & Column Information: Providing clinical information related to column names and structure in both .cursorrules and notebooks did not enhance Cursor’s performance. As experiment I used MS graphRAG to build a separate KG system which was were very helpful for creating functions and prompts, meaning to say that that it would be a helpful integration in Cursor. Perhaps if I had a better understanding of Cursors dot files it is possible to add that type of information.