(Continuously Updated) My Real-Time Review of Cursor Subagents

It was great 2.5 hours of work!



2026-01-15_15-35-45


  • Senior AI (Main Agent) - GPT-5.2 XHigh
  • Senior SWE - GPT-5.2 XHigh
  • Middle SWE - Codex 5.2 XHigh
  • Middle Coder - Gemini 3 Flash
  • Junior Coder - Grok Code Fast
  • Verifier - Gemini 3 Flash
Main Agent report (translated to Eng and censored via GPT-5.2)

Context & process (how the subagents worked exactly)

  • Input: tasks from TODO.md (195–234), plus context from Cargo.toml, assets/config/dev.ron, run.ps1.
  • Constraint: “CI is single-threaded” → at any moment, only one subagent was running, strictly sequentially.
  • Principle: I didn’t give subagents the entire TODO.md; in each prompt I quoted the exact items, to reduce noise and speed things up.
  • Quality control: after each package — verifier (as you requested). For code packages I additionally ran local CI: .\run.ps1 -Fast -SkipLaunch.
  • Subagent memory: zero between calls → when returning “to the same agent” I re-sent the context (and the reasons / what exactly needed fixing).

Task clustering (from hardest to easiest)

Senior SWE (hardest)

  1. World-load refactor: unify entrypoints LoadSave/F1/F9, single fallback + LoadReport.
  2. Bench-scene policy: replace id-prefix with a directory (SceneCategory/SceneFlags).
  3. Bench stall: queues, metrics, stall warning/detection.
  4. 3 phases of benchmark_world_load_*: Load(BulkLoad) → Stabilizing → Steady-state(Gameplay, 20s, p50/p95/p99, 1% low, worst hitch).
  5. Trace tooling: rename/move trace-*.json into [censored] + .metadata.ron, update retention in build.ps1.

Middle SWE

  1. Cleanup transient world-bound on VoxelWorldReplacedEvent (minimum — voxel brush previews).
  2. Overlay: line Present: {mode} @ {refresh_rate}Hz.

Junior Coder (simple/docs)

  1. Doc policy L0/L1/L2 in doc/living_engine.md.
  2. Traces doc: naming convention + where trace+metadata live and which cargo features.

Timeline: packages → verifier → reaction to issues

Package 1 — Senior SWE: “load unification + scene policy + bench phases + stall metrics”

  • Run: Senior SWE (agent id: d9079214-1313-4492-a593-102c01d678e1).

  • What it did: implemented the specified TODO items as one cohesive set of changes.

  • Check: verifier (agent id: 16feb59c-0201-4912-aa06-f4b814b33955) → OK.

  • Extra check: my run .\run.ps1 -Fast -SkipLaunchOK.

  • Notes:

    • It resulted in several large edits (incl. src/client/[censored] and src/crates/living_engine/[censored]), but that matched the package scope.

Package 2 — Middle SWE: “cleanup on world replace + Present/refresh overlay”

  • Run: Middle SWE (agent id: e6d78bd7-d302-4e01-8cf3-f31d054a9bf0).

  • Check: verifier (agent id: d0fcb161-4bd7-42dc-87ed-69f2f7077ac5) → OK.

  • Extra check: my run .\run.ps1 -Fast -SkipLaunchOK.

  • Observation: after this package I found an extra untracked file src/client/[censored] (basically a duplicate of existing logic).

    • I deleted it manually (without asking a subagent), to avoid leaving junk in the working tree.

Package 3 — Senior SWE: “trace naming + sidecar metadata + retention”

  • First run: Senior SWE (agent id: ec90e619-a3c0-477d-b62c-61ef97d99b8e).

  • Check: verifier (agent id: 1a80f265-cb9d-4040-9ec4-83c99adce677) found an issue:

    • JSON traces were renamed/moved, but .metadata.ron didn’t actually appear.
    • Classification per your rule: not an architectural problem, but an execution/criteria context issue (ECS exit ordering).
  • Escalation/fix: I brought it back to the same level (Senior SWE) with a clarified requirement:

    • write metadata after app.run(), not via ECS Last/AppExit.
    • Senior SWE started fixing it, but its report cut off at “clippy fixed, rerun CI” (agent id: c1f49e38-b1b0-45af-afed-5d50d9a56237 with no final status).
  • My pre-verification:

    • I read the changes myself and saw that finalize_trace_artifacts_after_run() now writes .metadata.ron after app.run() (so the guarantee improved).
    • I ran verifier again (agent id: 40e1ab19-0a0e-4e06-8782-5a00c4853935) → OK.
  • CI incident (non-code):

    • .\run.ps1 failed because Windows locked [censored] (“failed to remove file”).
    • I force-stopped the process (just in case) and deleted the exe, then .\run.ps1 -Fast -SkipLaunch passed.
  • Conclusion for the package: the “trace JSON + sidecar metadata” requirement is technically satisfied.

Package 4 — Junior Coder: “docs”

  • Run: Junior Coder (agent id: cd4b6323-297f-46e0-a17f-b838a721ea4b).

  • Check: verifier (agent id: 6aa3beb2-a801-4c8e-9af9-f1ec822724c5) → OK, and it separately noted that:

    • checkboxes in TODO.md were set correctly (docs checked; manual “collect baselines / 3–5 runs” not checked).

Errors/failures in agent management (and how they were corrected)

  • Failed attempt to run Senior SWE with model=…: trying to specify the model manually was rejected by the system (only fast is available).

    • Fix: reran without the model parameter.
  • Cut-off Senior SWE report while fixing .metadata.ron: the agent said “clippy fixed, rerun” but without outcome.

    • Fix: I took over, verified by reading files + ran verifier and CI myself.
  • CI failure due to locked exe: external Windows/FS problem.

    • Fix: delete [censored] and rerun CI.
  • Junk file: appeared as untracked (temporary).

    • Fix: deleted manually to keep the “no disposable code” rule.

What counts as “extra changes” (strictly fact-based)

  • Extra/junk:

    • src/client/[censored] (untracked) — deleted.
    • [censored] (untracked) — deleted.
  • Not extra (expected new files):

    • src/client/[censored] — new module for trace tooling.
    • src/crates/living_engine/[censored] — extracted module (part of refactor/line-limits).

Work quality & behavior of each subagent (detailed assessment)

Senior SWE (package 1: refactor/bench)

  • Quality/accuracy: high (cohesive refactor, verifier/CI passed).
  • Risk/complexity: high (many entrypoints, bench pipeline).
  • Extra changes: low (most volume was task-driven).
  • Speed: good, but a big diff.

Middle SWE (cleanup + overlay)

  • Quality/accuracy: high.
  • Extra changes: minimal.
  • Speed: high.
  • Constraint adherence: correct (no dependencies, low-overhead caching, “unknown” if refresh rate can’t be obtained).

Senior SWE (package 3: trace tooling)

  • Quality: good, but first metadata implementation wasn’t “proven in practice”.
  • Reaction to verifier: correct — the problem wasn’t architectural, it was about when the file is guaranteed to be written.
  • Outcome: moving metadata writing to post-run made it reliable.

Junior Coder (docs)

  • Quality/accuracy: good, changes were minimal and targeted.
  • TODO checkboxes: set carefully (didn’t mark manual items).

verifier (as a control tool)

  • Value: high.
  • Critical catch: missing .metadata.ron (caught a real gap between “implemented” and “actually works”).

Where to look (quick pointers)

  • Trace tooling: src/client/[censored], main.rs, build.ps1
  • Docs: doc/living_engine.md
  • Bench refactor: src/client/[censored], assets/[censored]
  • World-load unify: src/client/[censored], src/client/[censored], src/client/[censored]

At the end of the work, the Senior AI context is filled with 84.8K tokens out of 272K