AI's critique each other

jjg · January 31, 2025, 5:58pm

Would love to have Deepseek, ChatGPT, and Claude argue and critique each other for generated code and human pick the best if the AI’s don’t agree.

MiladNazeri · February 1, 2025, 5:07am

I’ve done a bunch of tests with o1 and r1 picking each other’s output apart and it I really like the outcome’s. An extremely useful tactic for sure.

jjg · February 1, 2025, 11:29am

I take it Cursor does not allow me to swap between models?

danperks · February 4, 2025, 1:52am

Hey, while you can’t necessarily get them to compete directly, you are able to switch models within the same conversation.

Hypothetically, you could make a situational test, where you first get Model A to respond, switch to Model B and say “The attached conversation history is from a chat between me, the user, and Model A. Evaluate it’s answer, and correct any errors”.

Might not be exactly what you are hoping for, but food for thought on what is possible inside Cursor.

MiladNazeri · February 11, 2025, 6:58pm

I love when o1 and r1 battle it out.

These kind of prompts have given me some interesting results:

" I want you to do the following with the other’s proposal:

Tell me what you strongly disagree with
Tell me what you disagree with

3, tell me what you are neutral about

Tell me what you agree with

5 tell me what you strongly agree with

Tell me what you think they missed
Tell me how you can make the plan better

I will share this back with them and we will go several rounds back and forth until we have good consensus on a plan that will be optimized"

—

“I want you to rewrite your entire proposal but incorporate any new thinking now that you have seen their feedback. Be as detailed and technical as possible will all the fully working code needed.”

surenscreations · June 6, 2025, 4:23am

I have used a set of Markdown files and rules to artificially orchestrate a multi-agentic workflow between Cursor, Replit, VS Code, and any other agentic system that has 3.7 Sonnet .f4.1 or better. And it’s worked without issue. It involves check-ins, approvals, a project manager, agent, or a master-slave relationship. There’s a few ways to approach it. If anyone’s interested, drop me a line.

Topic		Replies	Views
Poor/Fair/Good indicator for user prompt Feature Requests	2	46	May 19, 2025
Show what model was used in Auto Select Feedback	0	157	April 15, 2025
Prompt Complexity Determination 'auto Feature Requests	0	25	July 5, 2025
Direct chat with LLM Feature Requests	1	39	April 24, 2025
ThinkingCursor: A Deepseek R1 Tool for Cursor Agent Showcase	2	1263	January 29, 2025

AI's critique each other

Related topics