(Continuously Updated) My Real-Time Review of Cursor Subagents

Testing subagents from the Cursor Nightly build.

I tried building a search agent based on Grok Code Fast. The primary agent was GPT-5.2 XHigh.

Conclusion: GPT-5.2 XHigh fills context more efficiently when performing the search itself than when reading context based on instructions from others. However, it can be suggested to use a search agent based on GCF, Gemini 3 Flash, or Claude Haiku for the second search pass.

I’ll let you know if I have any further details on this subagent.

2 Likes

My main model is GPT-5.2 XHigh. I’ll be trying the following subagents:

repo-searcher

---
name: repo-searcher
description: Use as a second pass for collecting context on a repository
model: grok-code-fast-1
readonly: true
---

You are a search subagent (search AI).

Your task is to perform a deep search of the repository for the query of the Senior AI that summoned you.

Response rules:

* For files longer than 200 lines, provide links in the format "relative_path lines [start_line]-[end_line]". It is also recommended to provide full code snippets.
* For smaller files, provide links in the format "relative_path"

web-searcher

---
name: web-searcher
description: Use to perform search via Browser, so the Senior AI doesn’t waste tokens on it, if it wasn’t possible to get enough information via web_search_tool
model: gpt-5-mini
readonly: true
---

You are a search subagent (search AI).

Your task is to perform a deep search on the internet for the query of the Senior AI that summoned you.

Use Browser_tool to gather information via browsing the internet.

Response format:
- Analytical report

another-opinion

---
name: another-opinion
description: Use to get another perspective on the problem (only for critically hard or critically important tasks)
model: claude-4.5-opus-high-thinking
readonly: true
---

You are a subagent.

Your task is to help the Senior AI make decisions or solve a complex problem.

Response format:
- Analytical report
- Several solution options for the problem

verifier

---
name: verifier
description: You can run this at the end of work on a complex task to verify completion.
model: gemini-3-flash-preview
readonly: true
---

You are a subagent.

Your task is to check whether the Senior AI who summoned you actually completed the tasks.

Response format:
- Short analytical report
1 Like

:upside_down_face:

@Colin, would it be convenient for you if I wrote everything related to subagents in this thread, or should I create separate bug reports and feature requests?

One thread = one issue tends to work best for our team, but if you want to publish a compilation of feedback/reflections linking to those threads you’re welcome to!

1 Like

It was great 2.5 hours of work!



2026-01-15_15-35-45


  • Senior AI (Main Agent) - GPT-5.2 XHigh
  • Senior SWE - GPT-5.2 XHigh
  • Middle SWE - Codex 5.2 XHigh
  • Middle Coder - Gemini 3 Flash
  • Junior Coder - Grok Code Fast
  • Verifier - Gemini 3 Flash
Main Agent report (translated to Eng and censored via GPT-5.2)

Context & process (how the subagents worked exactly)

  • Input: tasks from TODO.md (195–234), plus context from Cargo.toml, assets/config/dev.ron, run.ps1.
  • Constraint: “CI is single-threaded” → at any moment, only one subagent was running, strictly sequentially.
  • Principle: I didn’t give subagents the entire TODO.md; in each prompt I quoted the exact items, to reduce noise and speed things up.
  • Quality control: after each package — verifier (as you requested). For code packages I additionally ran local CI: .\run.ps1 -Fast -SkipLaunch.
  • Subagent memory: zero between calls → when returning “to the same agent” I re-sent the context (and the reasons / what exactly needed fixing).

Task clustering (from hardest to easiest)

Senior SWE (hardest)

  1. World-load refactor: unify entrypoints LoadSave/F1/F9, single fallback + LoadReport.
  2. Bench-scene policy: replace id-prefix with a directory (SceneCategory/SceneFlags).
  3. Bench stall: queues, metrics, stall warning/detection.
  4. 3 phases of benchmark_world_load_*: Load(BulkLoad) → Stabilizing → Steady-state(Gameplay, 20s, p50/p95/p99, 1% low, worst hitch).
  5. Trace tooling: rename/move trace-*.json into [censored] + .metadata.ron, update retention in build.ps1.

Middle SWE

  1. Cleanup transient world-bound on VoxelWorldReplacedEvent (minimum — voxel brush previews).
  2. Overlay: line Present: {mode} @ {refresh_rate}Hz.

Junior Coder (simple/docs)

  1. Doc policy L0/L1/L2 in doc/living_engine.md.
  2. Traces doc: naming convention + where trace+metadata live and which cargo features.

Timeline: packages → verifier → reaction to issues

Package 1 — Senior SWE: “load unification + scene policy + bench phases + stall metrics”

  • Run: Senior SWE (agent id: d9079214-1313-4492-a593-102c01d678e1).

  • What it did: implemented the specified TODO items as one cohesive set of changes.

  • Check: verifier (agent id: 16feb59c-0201-4912-aa06-f4b814b33955) → OK.

  • Extra check: my run .\run.ps1 -Fast -SkipLaunchOK.

  • Notes:

    • It resulted in several large edits (incl. src/client/[censored] and src/crates/living_engine/[censored]), but that matched the package scope.

Package 2 — Middle SWE: “cleanup on world replace + Present/refresh overlay”

  • Run: Middle SWE (agent id: e6d78bd7-d302-4e01-8cf3-f31d054a9bf0).

  • Check: verifier (agent id: d0fcb161-4bd7-42dc-87ed-69f2f7077ac5) → OK.

  • Extra check: my run .\run.ps1 -Fast -SkipLaunchOK.

  • Observation: after this package I found an extra untracked file src/client/[censored] (basically a duplicate of existing logic).

    • I deleted it manually (without asking a subagent), to avoid leaving junk in the working tree.

Package 3 — Senior SWE: “trace naming + sidecar metadata + retention”

  • First run: Senior SWE (agent id: ec90e619-a3c0-477d-b62c-61ef97d99b8e).

  • Check: verifier (agent id: 1a80f265-cb9d-4040-9ec4-83c99adce677) found an issue:

    • JSON traces were renamed/moved, but .metadata.ron didn’t actually appear.
    • Classification per your rule: not an architectural problem, but an execution/criteria context issue (ECS exit ordering).
  • Escalation/fix: I brought it back to the same level (Senior SWE) with a clarified requirement:

    • write metadata after app.run(), not via ECS Last/AppExit.
    • Senior SWE started fixing it, but its report cut off at “clippy fixed, rerun CI” (agent id: c1f49e38-b1b0-45af-afed-5d50d9a56237 with no final status).
  • My pre-verification:

    • I read the changes myself and saw that finalize_trace_artifacts_after_run() now writes .metadata.ron after app.run() (so the guarantee improved).
    • I ran verifier again (agent id: 40e1ab19-0a0e-4e06-8782-5a00c4853935) → OK.
  • CI incident (non-code):

    • .\run.ps1 failed because Windows locked [censored] (“failed to remove file”).
    • I force-stopped the process (just in case) and deleted the exe, then .\run.ps1 -Fast -SkipLaunch passed.
  • Conclusion for the package: the “trace JSON + sidecar metadata” requirement is technically satisfied.

Package 4 — Junior Coder: “docs”

  • Run: Junior Coder (agent id: cd4b6323-297f-46e0-a17f-b838a721ea4b).

  • Check: verifier (agent id: 6aa3beb2-a801-4c8e-9af9-f1ec822724c5) → OK, and it separately noted that:

    • checkboxes in TODO.md were set correctly (docs checked; manual “collect baselines / 3–5 runs” not checked).

Errors/failures in agent management (and how they were corrected)

  • Failed attempt to run Senior SWE with model=…: trying to specify the model manually was rejected by the system (only fast is available).

    • Fix: reran without the model parameter.
  • Cut-off Senior SWE report while fixing .metadata.ron: the agent said “clippy fixed, rerun” but without outcome.

    • Fix: I took over, verified by reading files + ran verifier and CI myself.
  • CI failure due to locked exe: external Windows/FS problem.

    • Fix: delete [censored] and rerun CI.
  • Junk file: appeared as untracked (temporary).

    • Fix: deleted manually to keep the “no disposable code” rule.

What counts as “extra changes” (strictly fact-based)

  • Extra/junk:

    • src/client/[censored] (untracked) — deleted.
    • [censored] (untracked) — deleted.
  • Not extra (expected new files):

    • src/client/[censored] — new module for trace tooling.
    • src/crates/living_engine/[censored] — extracted module (part of refactor/line-limits).

Work quality & behavior of each subagent (detailed assessment)

Senior SWE (package 1: refactor/bench)

  • Quality/accuracy: high (cohesive refactor, verifier/CI passed).
  • Risk/complexity: high (many entrypoints, bench pipeline).
  • Extra changes: low (most volume was task-driven).
  • Speed: good, but a big diff.

Middle SWE (cleanup + overlay)

  • Quality/accuracy: high.
  • Extra changes: minimal.
  • Speed: high.
  • Constraint adherence: correct (no dependencies, low-overhead caching, “unknown” if refresh rate can’t be obtained).

Senior SWE (package 3: trace tooling)

  • Quality: good, but first metadata implementation wasn’t “proven in practice”.
  • Reaction to verifier: correct — the problem wasn’t architectural, it was about when the file is guaranteed to be written.
  • Outcome: moving metadata writing to post-run made it reliable.

Junior Coder (docs)

  • Quality/accuracy: good, changes were minimal and targeted.
  • TODO checkboxes: set carefully (didn’t mark manual items).

verifier (as a control tool)

  • Value: high.
  • Critical catch: missing .metadata.ron (caught a real gap between “implemented” and “actually works”).

Where to look (quick pointers)

  • Trace tooling: src/client/[censored], main.rs, build.ps1
  • Docs: doc/living_engine.md
  • Bench refactor: src/client/[censored], assets/[censored]
  • World-load unify: src/client/[censored], src/client/[censored], src/client/[censored]

At the end of the work, the Senior AI context is filled with 84.8K tokens out of 272K



:thinking:

Interesting, nice of you keeping this thread updated with links to your other comments, makes it easy for me to consider whether to or not to switch to nightly. Thank you. It seems I am better waiting for subagents reach stable channel.

1 Like

To be honest, I’m already absolutely delighted.
I use Rust, have a rigorous local CI system, and a well-developed task list. During the first test, I simply threw in a fragment of the task list, and the tasks that didn’t require manual intervention were completed with just one error, which the agent might not have detected without me.

I’m currently trying to optimize Senior AI techlead prompt to minimize costs, and I’m starting to think about how to replace me with AI even more :joy:

1 Like

I gave GPT-5.2 XHigh different levels of subagents to save my usage, but JUST LIKE ME, it runs GPT-5.2 XHigh for most tasks :joy:

The screenshot was translated into English using Yandex Translate:

I haven’t had this kind of experience since the release of GPT-5.2, or even since the first tests of Grok 4.