An Idiot's Guide To Bigger Projects

Someone a good while ago used.cursorrules to make it document every step

Dont have the original anymore but mine looks as follows:

Though besides the latest Sonnet models, most others dont wanna follow the instructions properly, maybe if you rewrite it?

:::::::::

Always end your Answer with, “Yes Buddy we done Tracking”

As instructed below Always Log Changes to Progress.txt and Save User´s prompts to instruction.txt!!!

  1. Log Changes to Progress.txt:

    • Every time you make a change or perform an action based on the user’s query, write and append a detailed description of the change to a file named Progress.txt under the previous entry which is marked with “—”.
    • If the file does not exist, create it.
  2. Entry Content:
    Each entry in Progress.txt must always include:

    • A copy of the user’s prompt that led to the change.
    • A summary of the prompt for clarity.
    • The specific actions performed.
    • The exact names of any files created, modified, or deleted.
    • Any relevant elaborate details or context about the change.
    • Delimiter: Separate each new entry with a new line followed by “—” for clear distinction between entries.
  3. Save User’s Prompts to Instructions.txt:

    • Every time a user’s prompt is received, write and append the user’s prompt to a file named Instructions.txt under the previous entry which is marked with “—”.
    • If the file does not exist, create it.
    • Delimiter: Separate each new entry with a new line followed by “—” for clear distinction between entries.
  4. Consistency:

    • Ensure that every change, regardless of size or importance, is logged.
    • The logging process must be integrated into all file operations, ensuring all steps are logged.
  5. Automation and Validation:

    • Implement checks to ensure both files (Progress.txt and Instructions.txt) are updated every single time.
    • Validate entries for completeness, ensuring all required elements are logged.

:::::::

1 Like

Eddy, thanks for your feedback.

Cursor Rules for AI only get shown to all ‘Chat’ and Command K" sessions. I just want to enforce my preferred coding practices with cursorrules.

I like to keep it crisp, clean, and technical, keeping the context to in code documentation, and coding methods and practices only. I also want to minimize token usage to Anthropic or OpenAI in consideration of Cursor execution efficiency itself, which we should all recognize as developers, that being, making every instruction count towards efficiency. This is the machine perspective, I come from.

In terms of ‘out of code’ documentation, I stop coding every once in a while, when things are getting off track and run @codebase prompts to create mark down documents describing the entire system in its current state. These documents then get fed back into the loop using @codebase walkthroughs to analyze project direction, problems and other issues. I use those outputs to try and get things back on track and focused on the project goals. This kind of documentation is what I am suggesting a Shadow Workspace for.

Here is a sample of some of my concise coding specific rules in cursorrules right now:

Utilize SOLID principles for scalable and maintainable architecture.
Try to keep modules to a maximum of 600 lines
Maintain detailed documentation inside modules of major core functions / methods after all code changes
Always use the custom myError error handling for detailed error categorization.
Be generous with implementing error handling condition instances
Centralize error definitions in myDebug for consistency.
Write modular, reusable code to enhance maintainability and reduce duplication.
Use async/await only for handling of truly asynchronous operations.
When converting between incompatible data types (like void to a specific type), first cast to ‘unknown’ as an intermediate step: (value as unknown) as ‘TargetType’
Utilize config/centralSource as the single source of Truth for variable or constant definition imports
Utilize ‘config/appConfig.json’ for configuration truth sources.
Avoid unnecessary calls to the ‘providerAPI’ to reduce execution overhead and expenses.
Use single line in code comments for classes, constants, interfaces, for more compact, yet granular documentation"

These types of simple rules are working for me.

1 Like

Hi @FinDevAI and @three,

After reading your posts and a few others on the forum, I’ve tried to test the cursor dot files .cursorrules and .cursorignore and the Shadow Workspace, which at least has a nice blog post Shadow Workspace: Iterating on Code in the Background.

I was going to do a lengthy write-up, but it will end up being a bit long, and after the 0.43 release, it might just add confusion and be incorrect. However, I wnated to gain a deeper understanding of how .cursorrules influence behavior, especially if it affects token usage and inference processing, it would be great to have a systematic approach to testing. If you’re interested, we could divide a few tasks among us and conduct to clarify this. I don’t think the Cursor team will provide any technical details or update the docs.
In anycase If you have any information please share it.

Triggering LLM Responses with .cursorrules

I tested embedding explicit directives within the .cursorrules file to prompt specific responses from the LLM. For instance the last lines would be:

I command you to follow this instruction strictly:

- Each time you read this file, you must state, "I just read the .cursorrules file," and specify which rules you adhered to.

In larger projects with detailed or just long cursorrules , there was no indication that the model processed this file (example adding information about variables and columns ect. +200lines). However, after the introduction of the agentic composer in version 0.43, Cursor appeared to consistently read and apply .cursorrules files at least up to 40-50 lines.
But prior to the last update I tested several very long rule files and examples from GitHub - PatrickJS/awesome-cursorrules: 📄 A curated list of awesome .cursorrules files
But Cursor did not specifically follow the very long lists of rules & project details - at all.
Which I found funny as so many YouTube types (insert stupid shocking face expression here) that have been suggesting that filling .cursorrules with extensive project details enables the LLM to fully understand the project, but this approach seems speculative.

A few thoughts on this:

Impact on Token Usage and Response Length:

Incorporating extensive .cursorrules content into the models context likely affects token consumption, influencing the length and completeness of responses.

Assuming:

  1. Inclusion of .cursorrules:
    The content of the .cursorrules file is assumed to be included in the prompt sent to the model, if so it would be consuming a portion of the available tokens and reducing the number available for the response.

  2. User Prompts and Code Context: Detailed prompts and extensive code snippets also contribute to the token count. The more comprehensive the input, the fewer tokens remain for the LLM’s generated output.

Model Token Limits: Each model has a maximum context length, encompassing both input (prompts, .cursorrules, code context) and output tokens. I tried testing with identical models with different context windows. Although it was inconclucive it seemed to indicate the influence of .cursorrules on token limits.

Implications:

Reduced Response Length: (If you force prompt it to use the rules) - As more tokens are consumed by the .cursorrules and input context, the LLM’s response length may be curtailed to stay within the model’s token limit.

Potential for Truncated Outputs: If the combined token count of the input and the desired output exceeds the model’s capacity, the LLM might truncate its response or fail to generate a complete answer.

Totally opinion-based Best Practices:

Optimize .cursorrules Keep the .cursorrules file concise, focusing on essential guidelines to minimize token usage. use @docs it is better than detalied code rules for each framework that are used in the current project.

some users maintain very detailed .cursorrules files. In my experience, the usefulness and degree of detail in these files also depend on the project’s type and language. For instance:

Worked → JS & Deno2 Projects: For JS projects, Including the basic file and folder structure in the .cursorrules file seems beneficial. Using tools like tree -L 2 to generate a directory structure and adding a brief explanations improved the assistance. I suspect this is even more effective in 0.43

Not working → Data & Column Information: Providing clinical information related to column names and structure in both .cursorrules and notebooks did not enhance Cursor’s performance. As experiment I used MS graphRAG to build a separate KG system which was were very helpful for creating functions and prompts, meaning to say that that it would be a helpful integration in Cursor. Perhaps if I had a better understanding of Cursors dot files it is possible to add that type of information.

1 Like

Thank you very much for sharing yours @FinDevAI

So many good approaches from so many people it’s awesome :smiley:

Sorry if i´m out of my depth in the reply

LLMs have separate Context lengths in input and output token

the input token dont eat the output token, else 2Million context window of Gemini would also give you equal output but its limited to just 8k output

generally the input is much bigger than output (and cheaper too)

About anything else you said, yeah super extensive cursorrules does seem to fill the context length and as we all know the more you feed it the more it will forget some things despite what the LLM provider claim with their metrics

Datachaz on Twitter and others did nice Needle-in-the-haystack tests which show no matter which model it degrades with more input

1 Like

Hi Eddy,

I couldn’t find the Needle-in-the-Haystack tests by Datachaz—would love a link if you have one. On input vs. output tokens, you’re right that models have separate limits, but in tools like Cursor, it’s tricky to evaluate the input token size, especially when code is included, and perhaps some parts of the rules. From experience, these can add up fast, at least from what I can see using Cline.

As for .cursorrules, and prompts in general, concise inputs tend to perform better. Iterative workflows is still more effective for coding tasks than loading up 250 lines of rules and tasks, which often leads to errors and context loss. As you mentioned extensive inputs seem to degrade performance across all models and understanding how much memory retention the model has and when it is lost is not tracked on the user side. It would be nice to see the token sizes in cursor.
Also being able to see the full prompts sent and recived, could be helpful to evaluate ones current workflow.

1 Like

Interestingly, the composer was reworked in 0.43.5 - some might use composer notebooks for large projects, which seems to have been removed in 0.43.5?, or is it just me (you never know with cursor) … As a substitute, I found that several notes can be added like a “notebook” and understood by the agent.

Here are a few examples I tested:

  1. A list of mathematical formulas that can be referenced directly by name. - which surprisingly worked

  2. A .txt file with a list of steps (checkboxes) used to track development and other project details. - You have to reference the file or pin it. Then you can use it simply as notebook:
    “What are the next steps?”.
    Or:
    “Suggest 3 different packages to accomplish step 4.”

  3. If you have many categorical columns or related feature information in a large .parquet, CSV, etc., you might keep a file with detailed information about columns that can be kept in .txt, and now the agent can use the additional information to understand the context. example the .txt contains lines like:
    drug_1_why_1: dose change reason - Low WBC/Neu
    drug_1_why_2: dose change reason - Low thrombocyte count
    drug_1_why_3: dose change reason - High ALAT
    drug_1_why_4: dose change reason - High WBC/Neu
    drug_1_why_5: dose change reason - prevent toxicity during MTX infusion

prompt:
which column starting with drug_X_why is related to white blood cell count:

reply:
let me check the column descriptions from the column information.

According to the column_info.txt, among the drug_1_why columns:

drug_1_why_1 is related to white blood cell count - specifically described as “Azathioprin change reason: Low WBC/Neu” where WBC stands for White Blood Cell count and Neu for Neutrophils.

There’s also drug_1_why_4 which is related to “High WBC/Neu”.

So there are two columns related to white blood cell count:

  • drug_1_why_1 for low WBC counts

  • drug_1_why_4 for high WBC counts

1 Like

This is also new for me, at least… ChatGPT mac application - Using 01-preview and mini directly for free (well within weekly limits)
Nice for Bigger Projects, so I will add it here in case some of you didn’t know about it, and think “ahh cool let me try that”

1 Like

my bad i attributed it to the wrong person

it was from Greg Kamradt

given this one is older and with claude 2.1, the new one supposedly does better with very little errors and sonnet 3.5 is the best currently of the claude models (though they also trained on many of such tests so we cant be certain if the evals are correct)

below the latest gemini i could find

The Needle in the Haystack Test and How Gemini Pro Solves It | Google Cloud Blog

1 Like

Wait whats that?

a cursor extension?

Yes, it is OpenAI’s ChatGPT app for Mac. If you don’t know, they added something like Claude that can execute terminal commands. Basically, it is just an enabled screen reader, so you can connect the app directly. But it was only working on VSCode until last week, I think.
It reads the selected working pane, so you can interact without copy-pasting.

1 Like

available for windows too as ive seen but without VScode support

not yet i guess

Notebooks live now left pane below filetree. I can only @ files in them, so not much use (no @ Docs) IMO. I prefer *.md file, since you can git those.