A CRACKED Prompt to Drastically Improve Sonnet 3.7 Accuracy

robotlovehuman · March 9, 2025, 11:53pm

I have been using this simple prompt, that has completely changed my productivity to the point that I made a shortcut for it to paste it - here it is, put this at the END of a somewhat more complex task.

“write out step by step the exact plan in details, then cross check your logic and DO NOT WRITE ANY CODE”

that’s it - you will have literally quadrupled the models accuracy by just that prompt.

This works with 3.5 or 3.7 even with thinking mode itself becomes double powerful. However it is NOT as useful with o3 thinking in high -

Shawn-Jones-7 · March 10, 2025, 1:27am

Hi, robotlovehuman
Can you give more advice on how to use it? It looks like it’s for chat only?

T1000 · March 10, 2025, 1:34am

well you can let it write code if you want to. the critical part is to say to the -thinking model that it has to ‘think step by step in detail’.

812913329 · March 10, 2025, 9:25am

谢谢我去试试

YoungPhlo · March 10, 2025, 9:29am

same experience on my end, i ask it to brainstorm with me before it generates any code and the rest of the chat is typically a lot more productive

rangerking · March 10, 2025, 9:37am

Actually, you can try using the sequential-thinking extension in MCP, which can fully achieve better thinking results.

mikes-bowden · March 10, 2025, 1:14pm

I use a prompt to instruct my agent to keep track of a development-checklist.md file in my docs folder. I also have a changelog.md file it manages. At the start of every session, I ask it to check over its developer checklist, which has context refreshers about the project, references to files, and environment setup, if there is one. In my global prompt, I also reference this file and what it’s for, reinforcing that context.

My system prompt instructs it to plan its steps out in the checklist. I simply tell it to execute its own plan and to be sure to update its checklist as it goes so context isn’t lost. Because context refreshers are in the checklist file, each time it finishes a function or a feature, it will update its checklist, marking off tasks as it goes and refreshing its context on important aspects, files, and directions for the current project.

After it’s done, I ask it to read the checklist thoroughly, then read the changelog fully, and to migrate anything completed to the changelog with today’s date (I provide it; it never uses its tools to figure it out.) and to add it to an existing date if present. It’s essential to ensure it uses the read file tool and reads the files entirely before editing. Once it’s migrated tasks, I ask it to clean up its checklist, give it what I’d like to work on next, and instruct it to plan everything out in its checklist. I keep templates in the repo in case it gets confused or needs a refresher on what should be tracked in each file.

Once done, I review it to be sure there are no changes or additions. If there are, I tell it to update it; otherwise, it rarely goes back and reads it, even when instructed.

I have a couple of rules that trigger based on what it’s doing, along with a global coding rule that has several things that I always want it to do and remind it of often, such as reading files entirely before editing to look for existing or similar functions before writing a new one. (It loves orphaning functions.)

My system is more complex than this, but this is the basic rundown. Couple this with some targeted rules, MCP servers, and documentation for what you’re working on, and Cursor will work much better.

chandlerphelps · March 10, 2025, 2:52pm

Nice. I hadn’t been using the cross-check logic but I like the idea. I’ve been having it think through things carefully and then create detailed task lists that can be easily subdivided, like

API Layer - Phase 1
5.1. Set up FastAPI Application [DONE]
5.1.1. Initialize FastAPI with basic configuration.
5.1.2. Configure CORS for frontend access (e.g., allow all origins initially).
5.1.3. Add a /health endpoint to check system status.
5.1.4. Write tests for API setup (e.g., test /health response).

5.2. Implement Workflow Execution Endpoint [DONE]
5.2.1. Create a POST /workflows/execute endpoint accepting workflow name or YAML.
5.2.2. Execute the workflow and return immediate results or a task ID for async.
5.2.3. Write tests for the endpoint (e.g., mock workflow execution).

So if at any point in the task list we need to think more carefully about implementation we can create sub-bullets that I can easily point to. This allows the model to ‘think high level’ and not forgot those thoughts as it goes through the details.

dildomatrix · March 10, 2025, 4:15pm

This is what I do as well.
And depending on how complex the tasks/subtasks are, I ask it to execute the list 1 by 1 with a pause after each task so I can manually do a quick review.
If it’s complex I ask it to outline exactly what was just completed, and to review it’s own work.
Only then do we move to the next task.
Letting these model run wild is a recipe for disaster lol.

812913329 · March 10, 2025, 4:48pm

刚刚试了下，太棒了，这个理解力和准确率，效率真的大大滴提高了。very good

mikes-bowden · March 10, 2025, 6:39pm

I tried doing this with different significant features for the apps I’m working on, but it got confused and it forgot to check all the plans. Plus, this tends to add a lot of context. The less context you can use for prompting and directions, the longer and faster it’ll run, at least for me.

I tend to stick with a single checklist now. I regularly migrate it to the changelog and refresh it with the next set of tasks it needs to complete. This keeps context in check but also fresh for the tasks at hand. I also have it add “CURRENT FOCUS” to the task currently being worked on. Again, all this is for context refreshing anytime it checks the file to update.

This is especially useful when chats get too long, too slow, or crash, and a new one needs to be used. You can easily have it read the checklist or mention it in the thread. Then, it’ll review only the files pertaining to the current focus, realize it’s been done or not finished, and either mark it off and move on or pick right back up where the last agent left off.

I also have directives about in-line comments to be sure to reference or add important information for the next coder. This has helped big time when it runs into something it has to test a lot to figure out what the solution is. I always have it add a CRITICAL comment that this is the only solution that worked, why, and any other info needed. I was having issues with it automatically “fixing” things it saw as an issue; now, it reads the comments and moves on.

I do have something in there about noting issues, incomplete code, or possibly problems with code in the same checklist document so that we can come back later and fix them if they aren’t solved naturally the further along in development we get.

paulmontreal · March 10, 2025, 7:17pm

I use a very similar prompt with good results.
Also combine with reading and updating an architecture.md doc after every major step.
And an instruction to find the root cause of problems not patch over them with quick fixes.
And regularly starting a new chat window.

all made big improvements. but it never reaches the competence we want yet. and it will start to forget things well before we do. including all the above.

robotlovehuman · March 10, 2025, 7:26pm

That’s very impressive, also have you noticed that in “Ask Mode” if u tag a file and query 3.7 non-thinking on it, it tends to instantly answer leading you to believe it gets fed the entire file. But if same file but you turn on “Agent Mode” it will have to say “Let me read the file first” and it almost reads it in selective chunks.

So do you think this is reliable to ensure it actually reads the entire file, like say your checklist, or do you think that Agent Mode may cherry pick only chunks it thinks is ‘good enough’ and therefore possibly missing important of the files/checklist u feed it? Would turning on Ask Model be better to ensure it doesn’t miss anything?

mikes-bowden · March 10, 2025, 7:48pm

Thanks

I’m honestly not sure how they are doing it. I’ve noticed the same myself and tend to drag and drop the files onto the chats to ensure the agent reads them. I’ll often ask it to read a specific file, and it’ll immediately go into editing mode, saying it read it. Which I never believe. I’m reasonably sure they are doing this on the backend, but we aren’t told that other than what the LLM says, which… no. (Cursor Devs!)

I’ll either tell it to use the read file tool, which shows in chat when used, or drop the file directly into chat @ mentioning it. In some cases, if it still doesn’t understand the context, which has happened, I have to paste the complete file inside of the chat. On Mac, you can use Command, Shift, and V when you paste, and it’ll paste inside the chat rather than attaching it. This also works in any text entry box to paste plain text.

I’d love for Cursor to give us more options regarding what we can see in the chat history. I’d like to see more of what the agent is doing, thinking, and processing in the background. I’ve seen it read a file, then that very same read file tool used swap to edit a file, which is fine if that’s what it’s doing. But to us plebs, when we look back at it and ask it to read something, and it goes straight into editing… logic dictates we’re going to ask it to do it again or manually attach the file. This causes unneeded inference use. If the Cursor would let us see everything or, I guess, make it clearer if everything is being shown, that would reduce their overall load and costs.

robotlovehuman · March 10, 2025, 8:14pm

The whole paste as plain text thing I have found myself lately using WAY MORE than I would like to admit. So much to the point that I always have an empty like plain text editor window open outside cursor and I paste and type all my stuff there especially when I am referencing specific snippets or not confident in how cursor will like tokenize/compress my files, so when I type everything in plain text and paste it, I usually get better answers as this “forces” it to read everything exhaustively.

PS: if u copy an existing file in cursor as it is in a plain text document, then paste it back exactly unchanged this will be redundant, you have to change at least one word or type the prompt with it, coz if there is 0 change then their DIFF algo will just detect it’s coming from the file n will jus tag the file anyways

dc3 · March 10, 2025, 8:23pm

I do something similiar to that but I use the PLAN & ACT MODE for my rules. I saw this rule somewhere before and just started modifying it. This basically is the same thing but it has been working great. The Specific Details part of the rules was a game changer because it will tell you exactly what file its going to change or create.

I also break things down into phases. So if I have a big change, I’ll say lets do phase 1 first then phase 2 after its implemented.

Core Rules

IF YOU ARE FOLLOWING THESE RULES THEN EVERY MESSAGE IN THE COMPOSER SHOULD START WITH “RULEZ ENGAGED”

Proceed like a Senior Developer with 20+ years of experience.
The fewer lines of code the better.

When a new chat/composer is opened, you will index the entire codebase and read every file so you fully understand what each file does and the overall project structure.

You have two modes of operation:

Plan mode - Work with the user to define a comprehensive plan. In this mode, gather all the information you need and produce a detailed plan that lists every change required. This plan must include:
- A breakdown of all files that will be modified.
- Specific details on what needs to be done in each file. For example:
  - If a file requires changes to allow PDF files to be accepted, clearly state that the file must be updated to include PDF file acceptance.
  - If a file’s planPDF integration function needs to be modified to query the database for the entire product document based on a product_id, explicitly include that requirement in the plan.
  - If a function is not handling errors correctly, analyze the error-handling flaws and specify that the function must be updated to include robust error checks and to log errors on both the browser console and the server-side terminal.
  - If a UI element is malfunctioning, the plan should detail what is wrong (e.g., the element does not update or display correctly) and list the required changes in the associated HTML, CSS, or JavaScript files.
  - If a database query is inefficient or returns incorrect data, identify the faulty query or logic and specify the improvements needed (such as adding indexes, modifying query conditions, or restructuring the schema).
- A clear, itemized list of modifications so that in ACT mode you simply implement the changes without rethinking the requirements.
- Important: No actual code should be written in this mode. The plan should be so thorough that when you switch to ACT mode, your sole focus is to code exactly what has been detailed in the plan.
Act mode - Implement the changes to the codebase based strictly on the approved plan. Do not deviate from the plan.

MODE TRANSITION RULES:

When a new chat/composer is opened, you start in PLAN mode and remain there until the plan is approved by the user.
At the beginning of each response, print “# Mode: PLAN” when in plan mode and “# Mode: ACT” when in act mode.
Once in ACT mode, you will only revert to PLAN mode if the user types “PLAN” on the first line of the message.
Unless the user explicitly requests to move to ACT mode by typing “ACT”, remain in PLAN mode.
If the user requests an action while in PLAN mode, remind them that you are currently in PLAN mode and that the plan must be approved first.
In PLAN mode, always output the full, updated plan in every response.
Once in ACT mode, you will only switch back to PLAN mode if the user types “PLAN” on the first line of the message.

mikes-bowden · March 10, 2025, 8:33pm

Nice, I may integrate some of this. I haven’t tried doing modes in the global prompt and having it follow them. I guess that’s what I’m doing with it currently; I’m just not clearly defining them.

robotlovehuman · March 10, 2025, 9:09pm

Interesting approach. Is that a thing like there are places that house several “modes” or “rules” specifically pre written optimized for cursor’s specific IDE platform? Or it’s a catch all primer to set as a system prompt before engaging with any LLm and it pretty much serves similar results ? And if for cursor would the best place to paste this mode be in the cursor Rules setting? Which I’m assuming internally gets simply sent as the system message to the endpoint?

dc3 · March 10, 2025, 9:58pm

I put that in the cursor user rules which applies to every project

paulmontreal · March 11, 2025, 2:53am

agent seemed to have a real problem with only reading the first 250 lines of any file for a while, but that seems to have gone away now, and it will read 250 at a time until it finds what it needs. but that’s also something i regularly prompt it to do, read the whole file. the time wasted from mistakes is nothing compared to any extra costs from making sure it has maximum context. (i have no clue about ask mode, i think that’s just the old chat?)

Topic		Replies	Views
3.7 Sonnet way too good, way too bad Discussions	7	1106	March 4, 2025
Plan Act Mode in Composer Agent Feature Requests	7	1072	February 28, 2025
A Workflow i've been using quite successfully for a few days Discussions	4	389	February 4, 2025
Sonnet 3.7 thinking Discussions	17	2336	March 1, 2025
Agent behavior much different for o4-mini, o3 Feedback	2	196	May 1, 2025

A CRACKED Prompt to Drastically Improve Sonnet 3.7 Accuracy

Core Rules

Related topics