(Continuously Updated) My Real-Time Review of Cursor Subagents

Artemonim · January 15, 2026, 12:37pm

It was great 2.5 hours of work!

Senior AI (Main Agent) - GPT-5.2 XHigh
Senior SWE - GPT-5.2 XHigh
Middle SWE - Codex 5.2 XHigh
Middle Coder - Gemini 3 Flash
Junior Coder - Grok Code Fast
Verifier - Gemini 3 Flash

Main Agent report (translated to Eng and censored via GPT-5.2)

Context & process (how the subagents worked exactly)

Input: tasks from TODO.md (195–234), plus context from Cargo.toml, assets/config/dev.ron, run.ps1.
Constraint: “CI is single-threaded” → at any moment, only one subagent was running, strictly sequentially.
Principle: I didn’t give subagents the entire TODO.md; in each prompt I quoted the exact items, to reduce noise and speed things up.
Quality control: after each package — verifier (as you requested). For code packages I additionally ran local CI: .\run.ps1 -Fast -SkipLaunch.
Subagent memory: zero between calls → when returning “to the same agent” I re-sent the context (and the reasons / what exactly needed fixing).

Task clustering (from hardest to easiest)

Senior SWE (hardest)

World-load refactor: unify entrypoints LoadSave/F1/F9, single fallback + LoadReport.
Bench-scene policy: replace id-prefix with a directory (SceneCategory/SceneFlags).
Bench stall: queues, metrics, stall warning/detection.
3 phases of benchmark_world_load_*: Load(BulkLoad) → Stabilizing → Steady-state(Gameplay, 20s, p50/p95/p99, 1% low, worst hitch).
Trace tooling: rename/move trace-*.json into [censored] + .metadata.ron, update retention in build.ps1.

Middle SWE

Cleanup transient world-bound on VoxelWorldReplacedEvent (minimum — voxel brush previews).
Overlay: line Present: {mode} @ {refresh_rate}Hz.

Junior Coder (simple/docs)

Doc policy L0/L1/L2 in doc/living_engine.md.
Traces doc: naming convention + where trace+metadata live and which cargo features.

Timeline: packages → verifier → reaction to issues

Package 1 — Senior SWE: “load unification + scene policy + bench phases + stall metrics”

Run: Senior SWE (agent id: d9079214-1313-4492-a593-102c01d678e1).
What it did: implemented the specified TODO items as one cohesive set of changes.
Check: verifier (agent id: 16feb59c-0201-4912-aa06-f4b814b33955) → OK.
Extra check: my run .\run.ps1 -Fast -SkipLaunch → OK.
Notes:
- It resulted in several large edits (incl. src/client/[censored] and src/crates/living_engine/[censored]), but that matched the package scope.

Package 2 — Middle SWE: “cleanup on world replace + Present/refresh overlay”

Run: Middle SWE (agent id: e6d78bd7-d302-4e01-8cf3-f31d054a9bf0).
Check: verifier (agent id: d0fcb161-4bd7-42dc-87ed-69f2f7077ac5) → OK.
Extra check: my run .\run.ps1 -Fast -SkipLaunch → OK.
Observation: after this package I found an extra untracked file src/client/[censored] (basically a duplicate of existing logic).
- I deleted it manually (without asking a subagent), to avoid leaving junk in the working tree.

Package 3 — Senior SWE: “trace naming + sidecar metadata + retention”

First run: Senior SWE (agent id: ec90e619-a3c0-477d-b62c-61ef97d99b8e).
Check: verifier (agent id: 1a80f265-cb9d-4040-9ec4-83c99adce677) found an issue:
- JSON traces were renamed/moved, but .metadata.ron didn’t actually appear.
- Classification per your rule: not an architectural problem, but an execution/criteria context issue (ECS exit ordering).
Escalation/fix: I brought it back to the same level (Senior SWE) with a clarified requirement:
- write metadata after app.run(), not via ECS Last/AppExit.
- Senior SWE started fixing it, but its report cut off at “clippy fixed, rerun CI” (agent id: c1f49e38-b1b0-45af-afed-5d50d9a56237 with no final status).
My pre-verification:
- I read the changes myself and saw that finalize_trace_artifacts_after_run() now writes .metadata.ron after app.run() (so the guarantee improved).
- I ran verifier again (agent id: 40e1ab19-0a0e-4e06-8782-5a00c4853935) → OK.
CI incident (non-code):
- .\run.ps1 failed because Windows locked [censored] (“failed to remove file”).
- I force-stopped the process (just in case) and deleted the exe, then .\run.ps1 -Fast -SkipLaunch passed.
Conclusion for the package: the “trace JSON + sidecar metadata” requirement is technically satisfied.

Package 4 — Junior Coder: “docs”

Run: Junior Coder (agent id: cd4b6323-297f-46e0-a17f-b838a721ea4b).
Check: verifier (agent id: 6aa3beb2-a801-4c8e-9af9-f1ec822724c5) → OK, and it separately noted that:
- checkboxes in TODO.md were set correctly (docs checked; manual “collect baselines / 3–5 runs” not checked).

Errors/failures in agent management (and how they were corrected)

Failed attempt to run Senior SWE with model=…: trying to specify the model manually was rejected by the system (only fast is available).
- Fix: reran without the model parameter.
Cut-off Senior SWE report while fixing .metadata.ron: the agent said “clippy fixed, rerun” but without outcome.
- Fix: I took over, verified by reading files + ran verifier and CI myself.
CI failure due to locked exe: external Windows/FS problem.
- Fix: delete [censored] and rerun CI.
Junk file: appeared as untracked (temporary).
- Fix: deleted manually to keep the “no disposable code” rule.

What counts as “extra changes” (strictly fact-based)

Extra/junk:
- src/client/[censored] (untracked) — deleted.
- [censored] (untracked) — deleted.
Not extra (expected new files):
- src/client/[censored] — new module for trace tooling.
- src/crates/living_engine/[censored] — extracted module (part of refactor/line-limits).

Work quality & behavior of each subagent (detailed assessment)

Senior SWE (package 1: refactor/bench)

Quality/accuracy: high (cohesive refactor, verifier/CI passed).
Risk/complexity: high (many entrypoints, bench pipeline).
Extra changes: low (most volume was task-driven).
Speed: good, but a big diff.

Middle SWE (cleanup + overlay)

Quality/accuracy: high.
Extra changes: minimal.
Speed: high.
Constraint adherence: correct (no dependencies, low-overhead caching, “unknown” if refresh rate can’t be obtained).

Senior SWE (package 3: trace tooling)

Quality: good, but first metadata implementation wasn’t “proven in practice”.
Reaction to verifier: correct — the problem wasn’t architectural, it was about when the file is guaranteed to be written.
Outcome: moving metadata writing to post-run made it reliable.

Junior Coder (docs)

Quality/accuracy: good, changes were minimal and targeted.
TODO checkboxes: set carefully (didn’t mark manual items).

verifier (as a control tool)

Value: high.
Critical catch: missing .metadata.ron (caught a real gap between “implemented” and “actually works”).

Where to look (quick pointers)

Trace tooling: src/client/[censored], main.rs, build.ps1
Docs: doc/living_engine.md
Bench refactor: src/client/[censored], assets/[censored]
World-load unify: src/client/[censored], src/client/[censored], src/client/[censored]

At the end of the work, the Senior AI context is filled with 84.8K tokens out of 272K

Topic		Replies	Views
My Subagent List v2 Guides rules , subagents	2	357	February 15, 2026
Cursor 2.4: Subagents Release Discussions	59	5119	February 16, 2026
How To Optimize Your Usage: The Best AI Models to Use, version 2.2 Guides	28	3909	December 9, 2025
Add Gemini-2.5-pro-exp to cursor Feature Requests	115	23520	June 2, 2025
(Continuously Updated) My Real-Time Review of Grok 4 Discussions	52	4304	September 26, 2025

(Continuously Updated) My Real-Time Review of Cursor Subagents

It was great 2.5 hours of work!

Context & process (how the subagents worked exactly)

Task clustering (from hardest to easiest)

Senior SWE (hardest)

Middle SWE

Junior Coder (simple/docs)

Timeline: packages → verifier → reaction to issues

Package 1 — Senior SWE: “load unification + scene policy + bench phases + stall metrics”

Package 2 — Middle SWE: “cleanup on world replace + Present/refresh overlay”

Package 3 — Senior SWE: “trace naming + sidecar metadata + retention”

Package 4 — Junior Coder: “docs”

Errors/failures in agent management (and how they were corrected)

What counts as “extra changes” (strictly fact-based)

Work quality & behavior of each subagent (detailed assessment)

Senior SWE (package 1: refactor/bench)

Middle SWE (cleanup + overlay)

Senior SWE (package 3: trace tooling)

Junior Coder (docs)

verifier (as a control tool)

Where to look (quick pointers)

Related topics