(Continuously Updated) My Real-Time Review of Cursor Subagents

Artemonim · January 14, 2026, 6:40pm

Testing subagents from the Cursor Nightly build.

I tried building a search agent based on Grok Code Fast. The primary agent was GPT-5.2 XHigh.

Conclusion: GPT-5.2 XHigh fills context more efficiently when performing the search itself than when reading context based on instructions from others. However, it can be suggested to use a search agent based on GCF, Gemini 3 Flash, or Claude Haiku for the second search pass.

I’ll let you know if I have any further details on this subagent.

Artemonim · January 14, 2026, 7:06pm

Artemonim · January 14, 2026, 8:51pm

My main model is GPT-5.2 XHigh. I’ll be trying the following subagents:

repo-searcher

---
name: repo-searcher
description: Use as a second pass for collecting context on a repository
model: grok-code-fast-1
readonly: true
---

You are a search subagent (search AI).

Your task is to perform a deep search of the repository for the query of the Senior AI that summoned you.

Response rules:

* For files longer than 200 lines, provide links in the format "relative_path lines [start_line]-[end_line]". It is also recommended to provide full code snippets.
* For smaller files, provide links in the format "relative_path"

web-searcher

---
name: web-searcher
description: Use to perform search via Browser, so the Senior AI doesn’t waste tokens on it, if it wasn’t possible to get enough information via web_search_tool
model: gpt-5-mini
readonly: true
---

You are a search subagent (search AI).

Your task is to perform a deep search on the internet for the query of the Senior AI that summoned you.

Use Browser_tool to gather information via browsing the internet.

Response format:
- Analytical report

another-opinion

---
name: another-opinion
description: Use to get another perspective on the problem (only for critically hard or critically important tasks)
model: claude-4.5-opus-high-thinking
readonly: true
---

You are a subagent.

Your task is to help the Senior AI make decisions or solve a complex problem.

Response format:
- Analytical report
- Several solution options for the problem

verifier

---
name: verifier
description: You can run this at the end of work on a complex task to verify completion.
model: gemini-3-flash-preview
readonly: true
---

You are a subagent.

Your task is to check whether the Senior AI who summoned you actually completed the tasks.

Response format:
- Short analytical report

Artemonim · January 14, 2026, 11:21pm

Artemonim · January 15, 2026, 9:27am

Artemonim · January 15, 2026, 9:28am

Artemonim · January 15, 2026, 9:29am

@Colin, would it be convenient for you if I wrote everything related to subagents in this thread, or should I create separate bug reports and feature requests?

Colin · January 15, 2026, 9:34am

One thread = one issue tends to work best for our team, but if you want to publish a compilation of feedback/reflections linking to those threads you’re welcome to!

Artemonim · January 15, 2026, 10:13am

Artemonim · January 15, 2026, 10:35am

Artemonim · January 15, 2026, 12:37pm

It was great 2.5 hours of work!

Senior AI (Main Agent) - GPT-5.2 XHigh
Senior SWE - GPT-5.2 XHigh
Middle SWE - Codex 5.2 XHigh
Middle Coder - Gemini 3 Flash
Junior Coder - Grok Code Fast
Verifier - Gemini 3 Flash

Main Agent report (translated to Eng and censored via GPT-5.2)

Context & process (how the subagents worked exactly)

Input: tasks from TODO.md (195–234), plus context from Cargo.toml, assets/config/dev.ron, run.ps1.
Constraint: “CI is single-threaded” → at any moment, only one subagent was running, strictly sequentially.
Principle: I didn’t give subagents the entire TODO.md; in each prompt I quoted the exact items, to reduce noise and speed things up.
Quality control: after each package — verifier (as you requested). For code packages I additionally ran local CI: .\run.ps1 -Fast -SkipLaunch.
Subagent memory: zero between calls → when returning “to the same agent” I re-sent the context (and the reasons / what exactly needed fixing).

Task clustering (from hardest to easiest)

Senior SWE (hardest)

World-load refactor: unify entrypoints LoadSave/F1/F9, single fallback + LoadReport.
Bench-scene policy: replace id-prefix with a directory (SceneCategory/SceneFlags).
Bench stall: queues, metrics, stall warning/detection.
3 phases of benchmark_world_load_*: Load(BulkLoad) → Stabilizing → Steady-state(Gameplay, 20s, p50/p95/p99, 1% low, worst hitch).
Trace tooling: rename/move trace-*.json into [censored] + .metadata.ron, update retention in build.ps1.

Middle SWE

Cleanup transient world-bound on VoxelWorldReplacedEvent (minimum — voxel brush previews).
Overlay: line Present: {mode} @ {refresh_rate}Hz.

Junior Coder (simple/docs)

Doc policy L0/L1/L2 in doc/living_engine.md.
Traces doc: naming convention + where trace+metadata live and which cargo features.

Timeline: packages → verifier → reaction to issues

Package 1 — Senior SWE: “load unification + scene policy + bench phases + stall metrics”

Run: Senior SWE (agent id: d9079214-1313-4492-a593-102c01d678e1).
What it did: implemented the specified TODO items as one cohesive set of changes.
Check: verifier (agent id: 16feb59c-0201-4912-aa06-f4b814b33955) → OK.
Extra check: my run .\run.ps1 -Fast -SkipLaunch → OK.
Notes:
- It resulted in several large edits (incl. src/client/[censored] and src/crates/living_engine/[censored]), but that matched the package scope.

Package 2 — Middle SWE: “cleanup on world replace + Present/refresh overlay”

Run: Middle SWE (agent id: e6d78bd7-d302-4e01-8cf3-f31d054a9bf0).
Check: verifier (agent id: d0fcb161-4bd7-42dc-87ed-69f2f7077ac5) → OK.
Extra check: my run .\run.ps1 -Fast -SkipLaunch → OK.
Observation: after this package I found an extra untracked file src/client/[censored] (basically a duplicate of existing logic).
- I deleted it manually (without asking a subagent), to avoid leaving junk in the working tree.

Package 3 — Senior SWE: “trace naming + sidecar metadata + retention”

First run: Senior SWE (agent id: ec90e619-a3c0-477d-b62c-61ef97d99b8e).
Check: verifier (agent id: 1a80f265-cb9d-4040-9ec4-83c99adce677) found an issue:
- JSON traces were renamed/moved, but .metadata.ron didn’t actually appear.
- Classification per your rule: not an architectural problem, but an execution/criteria context issue (ECS exit ordering).
Escalation/fix: I brought it back to the same level (Senior SWE) with a clarified requirement:
- write metadata after app.run(), not via ECS Last/AppExit.
- Senior SWE started fixing it, but its report cut off at “clippy fixed, rerun CI” (agent id: c1f49e38-b1b0-45af-afed-5d50d9a56237 with no final status).
My pre-verification:
- I read the changes myself and saw that finalize_trace_artifacts_after_run() now writes .metadata.ron after app.run() (so the guarantee improved).
- I ran verifier again (agent id: 40e1ab19-0a0e-4e06-8782-5a00c4853935) → OK.
CI incident (non-code):
- .\run.ps1 failed because Windows locked [censored] (“failed to remove file”).
- I force-stopped the process (just in case) and deleted the exe, then .\run.ps1 -Fast -SkipLaunch passed.
Conclusion for the package: the “trace JSON + sidecar metadata” requirement is technically satisfied.

Package 4 — Junior Coder: “docs”

Run: Junior Coder (agent id: cd4b6323-297f-46e0-a17f-b838a721ea4b).
Check: verifier (agent id: 6aa3beb2-a801-4c8e-9af9-f1ec822724c5) → OK, and it separately noted that:
- checkboxes in TODO.md were set correctly (docs checked; manual “collect baselines / 3–5 runs” not checked).

Errors/failures in agent management (and how they were corrected)

Failed attempt to run Senior SWE with model=…: trying to specify the model manually was rejected by the system (only fast is available).
- Fix: reran without the model parameter.
Cut-off Senior SWE report while fixing .metadata.ron: the agent said “clippy fixed, rerun” but without outcome.
- Fix: I took over, verified by reading files + ran verifier and CI myself.
CI failure due to locked exe: external Windows/FS problem.
- Fix: delete [censored] and rerun CI.
Junk file: appeared as untracked (temporary).
- Fix: deleted manually to keep the “no disposable code” rule.

What counts as “extra changes” (strictly fact-based)

Extra/junk:
- src/client/[censored] (untracked) — deleted.
- [censored] (untracked) — deleted.
Not extra (expected new files):
- src/client/[censored] — new module for trace tooling.
- src/crates/living_engine/[censored] — extracted module (part of refactor/line-limits).

Work quality & behavior of each subagent (detailed assessment)

Senior SWE (package 1: refactor/bench)

Quality/accuracy: high (cohesive refactor, verifier/CI passed).
Risk/complexity: high (many entrypoints, bench pipeline).
Extra changes: low (most volume was task-driven).
Speed: good, but a big diff.

Middle SWE (cleanup + overlay)

Quality/accuracy: high.
Extra changes: minimal.
Speed: high.
Constraint adherence: correct (no dependencies, low-overhead caching, “unknown” if refresh rate can’t be obtained).

Senior SWE (package 3: trace tooling)

Quality: good, but first metadata implementation wasn’t “proven in practice”.
Reaction to verifier: correct — the problem wasn’t architectural, it was about when the file is guaranteed to be written.
Outcome: moving metadata writing to post-run made it reliable.

Junior Coder (docs)

Quality/accuracy: good, changes were minimal and targeted.
TODO checkboxes: set carefully (didn’t mark manual items).

verifier (as a control tool)

Value: high.
Critical catch: missing .metadata.ron (caught a real gap between “implemented” and “actually works”).

Where to look (quick pointers)

Trace tooling: src/client/[censored], main.rs, build.ps1
Docs: doc/living_engine.md
Bench refactor: src/client/[censored], assets/[censored]
World-load unify: src/client/[censored], src/client/[censored], src/client/[censored]

At the end of the work, the Senior AI context is filled with 84.8K tokens out of 272K

Artemonim · January 15, 2026, 3:30pm

Artemonim · January 15, 2026, 5:25pm

Artemonim · January 15, 2026, 5:25pm

liquefy · January 15, 2026, 8:13pm

Interesting, nice of you keeping this thread updated with links to your other comments, makes it easy for me to consider whether to or not to switch to nightly. Thank you. It seems I am better waiting for subagents reach stable channel.

Artemonim · January 15, 2026, 8:19pm

Artemonim · January 15, 2026, 8:46pm

To be honest, I’m already absolutely delighted.
I use Rust, have a rigorous local CI system, and a well-developed task list. During the first test, I simply threw in a fragment of the task list, and the tasks that didn’t require manual intervention were completed with just one error, which the agent might not have detected without me.

I’m currently trying to optimize Senior AI techlead prompt to minimize costs, and I’m starting to think about how to replace me with AI even more

Artemonim · January 16, 2026, 10:42pm

Artemonim · January 17, 2026, 8:19am

I gave GPT-5.2 XHigh different levels of subagents to save my usage, but JUST LIKE ME, it runs GPT-5.2 XHigh for most tasks

Artemonim · January 17, 2026, 7:34pm

The screenshot was translated into English using Yandex Translate:

I haven’t had this kind of experience since the release of GPT-5.2, or even since the first tests of Grok 4.

Topic		Replies	Views
My Subagent List v2 Guides rules , subagents	2	1481	February 15, 2026
How To Optimize Your Usage: The Best AI Models to Use, version 2.2 Guides	28	4193	December 9, 2025
Subtasks use composer 1.5 Bug Reports browser , anthropic , composer , subagents	23	1190	April 1, 2026
Cursor 2.4: Subagents Release Discussions	59	6789	March 10, 2026
How To Optimize Your Usage: The Best AI Models to Use, version 3.0 Guides java	25	5733	February 28, 2026