Aider (Aider LLM Leaderboards | aider) is a very well respected benchmark. Cursor Agent would almost certainly beat the best in this table, since it uses the top two models that are there, and can check its own work.
Aider is also an agent (a director competitor to Cursor, in fact).
As for why no one has ranked the agents in their own leaderboard, that is an excellent question. I for one would love to see someone benchmark Cursor, Aider, Claude Code, etc., using the strongest LLMs of course. The benchmarks would have to be structured differently though, testing the agent’s understanding of large codebases instead of smaller code snippets.
I agree, but there should be separate categories, TUI and GUI agents. not everyone can or will use TUI where others wont use GUI.
The same set of tasks and same models would need to be used. but its slightly hard due to specific differences of the agents capabilities.
Its not just understanding codebases but being able to do a task efficiently
livebench just release this bechmark on Coding Agent tools and tried to use the same LLM to focus only on tool capability.
This might answer your question.