Why are agents (including Cursor) not ranked on coding benchmarks?

kendonB · April 8, 2025, 8:18pm

Aider (Aider LLM Leaderboards | aider) is a very well respected benchmark. Cursor Agent would almost certainly beat the best in this table, since it uses the top two models that are there, and can check its own work.

knakamura13 · April 8, 2025, 9:51pm

Aider is also an agent (a director competitor to Cursor, in fact).

As for why no one has ranked the agents in their own leaderboard, that is an excellent question. I for one would love to see someone benchmark Cursor, Aider, Claude Code, etc., using the strongest LLMs of course. The benchmarks would have to be structured differently though, testing the agent’s understanding of large codebases instead of smaller code snippets.

condor · April 8, 2025, 9:57pm

I agree, but there should be separate categories, TUI and GUI agents. not everyone can or will use TUI where others wont use GUI.

The same set of tasks and same models would need to be used. but its slightly hard due to specific differences of the agents capabilities.

Its not just understanding codebases but being able to do a task efficiently

ShiRaTo · April 14, 2025, 11:55pm

livebench just release this bechmark on Coding Agent tools and tried to use the same LLM to focus only on tool capability.

This might answer your question.

system · May 14, 2025, 11:56pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Are there published benchmarks for Cursor Agent mode? Discussions	1	873	January 7, 2025
How could I compare the codebase understanding&better context ability? Discussions	2	181	February 19, 2024
Best LLM Benchmarks for code? Discussions	1	3264	December 19, 2024
Cursor.sh for big projects? versus cody? Discussions	20	3354	March 22, 2024
Cursor-fast as Reranker Discussions	0	545	January 21, 2024

Why are agents (including Cursor) not ranked on coding benchmarks?

Related topics