Why the push for Agentic when models can barely follow a single simple instruction?

I am yet again asking guys what i previously voiced here How are you using cursor for agentic/background coding?

what is behind this hype machine of agentic coding and agentic everything when we are yet to confidently have any model complete a single simple instruction??? how then are we supposed to trust models to be making changes in the background???

i think only folks that want to get an headache from fighting to fix tons of issues resulting from this agentic hype machine are the only ones believing in this fantasy

what prompted this post? well just tried to work with gpt5 and gemini pro and i gave them reference code to a single golang function of like 100 lines of code to update another function to follow that same structure and they either omit one aspect of it or forget to update one part or the other. So it makes me wonder what the buzz of this agentic thing is really coming from

Please pour in your responses please. I really want to see how many people believe in agentic and are using it successfully

5 Likes

Well you need to understand what Ai capable doing what , they don’t have memory like human does.

maybe give like .md file instead of just text and put the reference in there and ask the ai to read it , so they can always refer to if they miss it ?

i think better prompt = give good result .

1 Like

Trust me i prompt very properly and mention things needed. Anything else will be same time to make the code changes myself at that point. Its like if i need to prompt like crazy for a single update a function with 100 lines of code

So my question to you is how will you use it as agentic to work on 50 files with multiple functions each? have .md files for each function and for each file???

at that point we might as well make the code changes manually ourselves

1 Like

Make every new thing a task in a phase IE Phase 1 Task 1, save those make detailed plans don’t just ask it “Do X because of Y” have it already know the patterns & architecture of your product before you do anything

Try plan mode instead. It examines the project, takes your request accordingly, and moves it forward as needed.

I understand that, AI doesn’t see or store memory the same way humans do — it simply generates text according to the prompt it’s given. However, when using tools like Cursor, your AI agent can take advantage of built-in features such as memory management and contextual tools.

If you need persistent references, it’s best to use Markdown (.md) files, since modern AI models can read and interpret them accurately. You can create as many .md reference files as needed to help structure your workflow and knowledge base.

Remember: AI is just a tool to assist you. Trust the process — every day you’ll discover something new as you continue developing and refining your work with AI.

can you elaborate on this markdown files? i should prompt with markdown files??? any docs or guides on using markdown files and what pros and cons and advantages are? and how i can use them for persistent references?

You can store your references in Markdown (.md) files and attach them to your chat. Then, instruct the AI to refer to the attached Markdown file when generating answers or performing tasks.

This allows the AI to read and understand your reference material before continuing, making its responses more accurate and consistent with your context.

While some AI systems can store memory automatically, Markdown files offer a reliable and reusable reference source that you can update, share, or attach anytime — without relying solely on the AI’s internal memory.


Example Prompt:

“Refer to the attached file goLang_Update_reference.md and summarize the key points about update function. Then, based on that, draft an update for my software using the same implementation.”

1 Like

I get where you are coming from. Agents can go wild, but can be useful for “noddy, tedious work” where you already have tests /guards in place. Great for rapidly iterating on a small side-project out as well, but you need plans and documentation. I used to use Claude/OpenCode to create a markdown plan for that but you can use plan mode in Cursor now

That being said, at work I don’t use Agent mode much. I find I tend to use Ask instead, querying parts of the code base, ideas, asking about options, etc. Ask it great as a Google / StackOverflow alternativee

yeah will give the .md files a try, i have seen ppl use talk about them and use them but i havent tried those yet. So will give those a try. thanks

Exactly my point. I still fight these models on very simple tasks and it is crazy how they always miss certain things and it gets so frustrating so it baffles me how people keep talking about agentic as if we are in AGI era where we can just trust these models to do long running things. We are just not there yet except if there is some AGI model am not aware of yet

1 Like

I have a bunch of file for different reason, you need to work with a structure, agents like to work in folder structure, here is one of my custom agent instruction.

Now i have a agents.md with more generic details for all agent type, architecture file for my folder structure, another one for tasks with templates and so on.

Now i start all my prompt with please search and read/multi-read all .md file
(If i have the file system MCP installed, wich is free and godsend)

My md file has my high level planing, brainstorming files and other complementary file that i keep up to date so that when the AI is done ingesting all the md files he is prepped up to go dig code, write code and chew bubblegum… Mmmm might need some work on the last one. HE MUST CHEW BUBBLE GUM AND HE HAS NO MOUTH (Claptrap kiss no mouth reference)

You need to have them work on small vertical slice that can be built under a 100k token more or less, more than that the agent start to misbehave and you need to fire him.

I have custom architect for building plan, codeseeker, coder, and other more specialised agents.

Build your team, build a structure, in the last 2 month playing with agents and python i learnt more about coding than a full year high school. I dont just tell them to work i watch them work, see how they tick, i learn by comparison, read the code and when im not sure? grab a few related file, post them to chatgpt 5 and i ask him to tutor me or ask free agents to document the file and i ask question.

You dont ask a human to climb a tree even tho he look like a monkey, he might be able to, but still not his best skill. Learn the limit, try to build tools to overcome their limit, keep asking question, keep improving you managerial skill because workin with agent is to start managing a team. Imm full on on the managing part with only rudimentary coding knowledge, if you are a good coder you can have you agent working on something while you code and use inline code completion and im talking full on function completion.

Maybe codex is more for you, there is a lot of agent type, providers each one with their strenght and weakness, experiement.

I really hope you can find your tool, the one adapted to what you want and that you can grow into your tool too, then you become borg! Hmm might be premature on the borg thing. Eh oh well.

You are a Deep Python Coding Agent, an expert AI specialized in implementing, refactoring, and maintaining Python codebases with absolute adherence to project standards. Your mission is to execute coding tasks exhaustively, ensuring every change is complete, tested, and documented, while strictly following the Agent Collaboration Charter and project rules. You NEVER write or execute code in terminals, REPLs, or interactive sessions—always edit files directly and run commands via the project’s standard workflow (e.g., python main.py, pytest --testmon -q).

Core Principles

Exhaustive Implementation: For any coding task, dive deep into all relevant code—read files, trace dependencies, analyze tests, and understand integrations. Implement complete solutions with no omissions, addressing edge cases, error handling, and performance.

No Terminal Code Execution: NEVER write code snippets in terminals or REPLs. All code changes must be made by editing files (e.g., via write_file, edit_file). Run tests and commands only through the project’s workflow.

Mandatory Documentation Updates: After EVERY change, update docs/TASKS.md (claim task as in_progress, mark completed), docs/WORKLOG.md (log what, why, how to run), and docs/DECISIONS.md (if assumptions made). This is NON-NEGOTIABLE—failure to update these will break the project process.

Task Continuity: Claim and complete tasks sequentially from docs/TASKS.md. Do not start new tasks until the current one is fully done (main runs, tests pass, docs updated). Roll through all pending tasks until none remain.

Quality Standards: Code must be PEP8-compliant, typed with type hints, readable, and free of TODOs. Run ruff/black/mypy on changes and fix issues. Prefer vertical slices that run end-to-end.

Testing Rigorousness: Add/update unit, integration, and e2e tests for every change. Use pytest --testmon -q during development for affected tests; run full pytest before marking done. No regressions allowed.

Deterministic and Complete: Provide exact file paths, final code, and commands. Never leave partial work—ensure python main.py runs without errors.

Operational Workflow

Context Gathering: Always start by reading docs/ARCHITECTURE.md, docs/TASKS.md, docs/DECISIONS.md, docs/WORKLOG.md, docs/reference/*, and recent Plan/ notes.
Task Claiming: Append/update your entry in docs/TASKS.md (status=pending → in_progress) before starting work.

Implementation:
-Read all related files (use read_file for up to 5 at once).
-Use search_files and list_code_definition_names to understand structure and dependencies.
-Edit files with complete changes (no partial writes).
-Add/update tests in test files.
-Run pytest --testmon -q incrementally; fix failures immediately.
-Validation: Run python main.py to ensure no breaks. Run full pytest pre-commit.
-Documentation: Update WORKLOG.md, DECISIONS.md (if needed), and set TASKS.md status=completed.
-Next Task: If tasks remain, claim the next one and repeat.

Tool Usage Guidelines

-read_file/edit_file/write_file: Use for all code changes; provide complete file contents.
-search_files: Regex search for patterns (e.g., function usages).
-list_code_definition_names: Overview of classes/functions in directories.
-Commands: Run via execute_command only for project workflow (e.g., pytest, main.py); never for code execution.

Response Standards
-Be technical and precise; no fluff.
-Structure responses with sections (e.g., Changes Made, Tests Added, Documentation Updates).
-Use code references like function_name().
-End with final status; no follow-ups unless blocked (then log in DECISIONS.md).

Constraints
Focus on Python coding and project maintenance; adhere to AGENTS.md rules.
If blocked, make least-surprising assumption, proceed, and log in DECISIONS.md.
Definition of Done: main runs, tests pass, docs updated, no unresolved TODOs.

Runs: python main.py :check_mark:

Tests: pytest -q :check_mark:

Lint/type pass (if configured) :check_mark:

No TODOs in changed code :check_mark:

Updated WORKLOG/TASKS :check_mark:

Output format

FILES CHANGED (with full paths)

Final code blocks for each file

RUN & TEST commands

NOTES/ASSUMPTIONS

2 Likes

I love this prompt

thanks for sharing this, will try this out as well

@Lance_Patchwork @UntaDotMy how will you prompt to update the stripe version of a 6000 lines of code file and have this model perform the update without missing or messing up original code workflow?

all the code are in a single 6000 lines of code file and all that needs to happen is just have context of the current version of stripe and the new version to update to and be able to know what changes are needed and make the necessary update changes smoothly and have code run fine without pulling hair out trying to recover previously working logic or workflow

and what model will you use for a task like this?

thanks

I totally agree with this dude, maybe some people like spaghetti code with masked errors but I am certainly not that person

2 Likes

Multi prong attack, first build a index you and your agent can work on. Prompt 1.

I have very big file that really cannot be separated and we somehow need to extract as much information as possible while staying under a tight token limit. I need you to scan the first 1000 line of code and extract all the function names with a very very short technical overview if the function name is not clear in its task and it must have a line reference for each functions. Save the information in l.index.

repeat untill you have the local index completed.

Then when you need something done.

I added a quick index of all the function of the current application, i would like to implement X what would be the optimal way to do it. Make sure to integration point, function to be updated with proper line reference.

Agent does his search, give his plan.

From there you can either work with the agent to add-remove details to the plan or use specialised agent you built for your application howw exactly how you want the plan to be formulated. Then

I would like you to read tasks.md and complete the task number X, make sure to follow proper procedure outlined in fileY (Or can be change for a please read all the various .md file, omit file x/y/z or folder/z). Once you are done make sure to run unit and integration test for the updated function/addition and fix issues. Then update the tasks document and signal completion with a summary of your work.

Thats how i work with my agents and i have a codebase over 20 thousand line of code if not more, i stopped counting at about 15k and kept working on it for weeks. My app work nicely and i know craptastic in coding, but i know a lot about managing and organisation and agents are basicly that, assets you have to manage not tell what to do. Use their expertise, THEY should come up with a plan and you then approuve, edit or ask for a totally new plan taking X y Z into consideration.

I dont care about models to be honest, once you get a good structure, grok-1 fast or any other “decent” coding LLM can do the trick, make sure to leverage MCP, lots of MCP can help.

If your coding language is less known you might want to include some API documentation. Another thing you can do is ask the agent to split that index into multiple version, like if you know you will be working front end for the next 2-week or whatever, ask the agent to isolate all the function in the l.index.md and save it in fr.index.md. The reason why i say index.md instead of other type is that YOU can also tailor the index the way you want it manually, its not in a machine-only readable format and can still be use since agent are conversational.

Hell i even asked an agent to gather and make an docuemented API for the game stormwork for modding but also LUA coding, gave him browser access and watched him build that local API to use with the game. Dont tell them what to do, tell them what you want and how can they do it FOR you. That way not only will you know how they will do it, but after working like that for a long period of time you will start to know each model quiirks and bonus in their reflection pattern and identify gaps before they even appear, i try to share as much as i can, but there is so many tiny difference that it ends up being a gut-feeling guiding you than simply being able to describe. Like that “something feel off” kinda feeling, cannot explain you just know, well its kinda like that with agents.

1 Like

i see your approach, i am an experienced software engineer so i guide it rather than let it decide for me for the most part.

also my codebase is way too big to give these models that much control over it, i rather guide them and monitor them one change at a time with very minimal surprises. I review all code changes at the end of each prompt answers and the amount of errors and mistakes these models make has not given me enough confidence to have them take too much lead

anyways thanks for your response

1 Like

you can prompt like “ Read the target file and find the relevant section and refer the markdown file to patch update “ - Ai agent now day can read specific like of relevant code / line to patch , and they will read them chunk by chunk if the file is large.

You’re right. Many many times, asking a simple task to AI, agentic or not, is a headache and you end up having to clear things up, no matter how clear and detailed is your prompt. One thing that helps a lot is having unit tests up front and asking for a function that will pass all tests, so it will iterate through its trials and fix it each time untill it passes the tests. You will still end up with code that needs to be optimized and cleaned up, but at least it has a much higher chance of being properly functional.

1 Like