For now, using Cursor, I wish I had another agent overlooking the first agent and continuously checking the things the first agent always forgets - no workarounds, only the right things, right direction, best practices, etc.
This manual checking job that I’m doing is really not complex at all for another critical-thinking agent, it’s just that the context of the first agent is too loaded with little details, so it’s easy to hallucinate.
It seems it’s unable to do reliable thinking both at the high-level and the low-level at the same time, which is what we as developers always do - looking at the big picture, them zoom in, zoom out, etc.
Also, the first agent is not good at testing itself - it’s so sequential, it can’t complete a complex task if it involves running some tests in docker, take some thinking branch and come back… It loses the main branch of thinking.
I’m hoping we’ll get there soon - multiple agents thinking at multiple levels, tasks branching, critical thinking at each level, something like that.
It’ll also be cheaper, btw. One sequential thread requires a huge context, whereas decomposition to small focused contexts, and composition back, and interaction of multiple agents on the same task, each having its own small focused context - won’t require so many tokens.