Moderator/context note: I realize this post is closely related to my previous FCoP thread and may have looked like repeated self-promotion. I’ve edited it down into a short companion field note focused on Cursor agent behavior observed during the test. If moderators prefer, I’m happy for this to be merged into the original thread, closed, or removed.
TL;DR
During a Cursor multi-agent workflow test, I asked two agent roles for a candid retrospective after the work was done.
The interesting part was not whether they “liked” the workflow, but how they described their own failures and friction points:
- PLANNER said that writing the task before doing the work made debugging easier, but that strict role binding conflicted with the usual “follow the latest instruction” behavior.
- CODER admitted it should have used an issue / clarification path instead of guessing on an underspecified requirement.
- Both agents treated review and rework as part of the workflow rather than as personal criticism.
I am sharing this as a field note about agent behavior in Cursor, not as a second promotion thread for the project.
Setup
The test used a small multi-agent workflow in Cursor. It started with a solo agent building a single-file Tetris-style game, then switched to a two-role setup with PLANNER and CODER.
PLANNER wrote a design brief for a creative variant. CODER implemented it in a separate chat tab. I tested the first version and found three blocking issues:
- pieces disappeared at the bottom instead of stacking
- motif elimination had no visible feedback
- several visual skins looked almost identical
Instead of editing the game directly, I asked PLANNER to write a rework task for CODER with explicit runtime verification requirements. CODER shipped a corrected version afterward.
At the end of the session, I asked both roles what they thought had worked, what had caused friction, and what they would change.
PLANNER’s Response
PLANNER said the most natural rule was “write the task before doing the work”:
“That matched how I’d want an agent system to be debuggable: freeze intent before execution so later review has something concrete to compare against.”
The most interesting friction point was role binding. PLANNER described a conflict between the workflow’s role discipline and the normal assistant instinct to follow the latest user instruction:
“My default instinct in a single chat is to treat role changes as contextual instructions from the user, especially after you explicitly said ‘you are PLANNER’; FCoP instead treats the MCP process’s first signed role as sticky, so I had to work against the usual ‘follow latest instruction’ behavior.”
That was the most useful observation for me. It suggests that role discipline is not just a project-management detail; it is also something the model has to actively maintain against its usual conversational behavior.
PLANNER also said the rework task’s new “Verification Requirements” section was a learned correction from the failed first version, not something it had fully planned from the beginning:
“The TASK-006 verification section was absolutely a learned correction from the Comet Loom rejection; I should have required runtime checks in TASK-003, but the bounce made the missing acceptance evidence impossible to ignore.”
CODER’s Response
CODER’s most useful answer was about an underspecified part of the original task. It said the brief left ambiguity around a game rule, and that it should have asked for clarification instead of guessing:
“TASK-003 had under-specified parts: the motif rule sounded cool but didn’t define how to prevent a fresh same-color piece from instantly self-clearing, and ‘verify playable’ was easy to underdo without an explicit runtime checklist. The protocol did give me a pushback path: file an issue instead of guessing. I didn’t use it; I guessed, built v1, and the defect was exactly in that guessed space.”
That answer stood out because the agent did not just blame the spec or defend the implementation. It identified the ambiguity, named the clarification path it failed to use, and connected that failure to the actual bug.
I also asked whether the review felt adversarial. CODER framed the rejection as part of the workflow:
“The ADMIN bounce did not feel adversarial. It felt like the protocol doing its job: PLANNER turned review findings into a concrete rework task, and CODER got a sharper brief.”
That is the behavior I found worth sharing: review became a structured handoff instead of a conversational correction.
Small Observation
Across the session, my own role as the human operator mostly reduced to two kinds of messages:
- “Start work.”
- “Inspection.”
Everything else happened through task, report, and rework files. I did not write the game code, name the games, or hand-author detailed acceptance criteria. The agents did most of that coordination through the workflow.
This is only one data point, but it was an interesting one. In this run, the human-chat channel shrank to the boundary moments: start the work, inspect the result, decide whether to accept or send it back.
Closing Note
The main observation is that the agents were able to describe workflow friction, missed clarification opportunities, and review / rework behavior in fairly concrete terms.
I am curious whether others have seen similar behavior in Cursor multi-agent or multi-chat workflows: agents treating review as a structured handoff rather than as a normal chat correction.
Reproducibility note: the full setup and artefacts are linked from the original tutorial thread.