Which AI model in Cursor AI is best for coding tasks

Hi everyone,
Based on your experience with AI models inside Cursor AI, such as:

  • GPT-5

  • GPT-4o

  • Claude-4 Sonnet

  • Claude-4 Sonnet 1M (1M tokens)

  • Claude-3.5 Sonnet

  • Claude Code

  • Opus Code

  • Grok Code Fast

Which model do you find the best for coding tasks?
Are there specific cases where one model works better than others (e.g., large projects, performance optimization, or complex code generation)?

I’m also very interested in hearing your experiences specifically with Claude Code, Opus Code, and Claude Sonnet 1M (the 1 million token model) — how do they perform compared to the others?

And regarding Grok Code Fast, do you think it’s actually good compared to the rest?

Hi Abdelrahman,

A theoretical comparison can only get you so far. A highly practical approach is to run your own personal benchmark on a real project—that’s the definitive way to discover which AI model works best for your specific workflow.

My proposal / idea of using Vitest for instant feedback and structuring the process in a .md file could be right for your case. Here’s a suggested plan to build on the idea to get the most out of your test:

Focus on a “Full-Stack” Todo App

You can set up a .md file to guide your test.

1. The Project Setup:
A solid foundation is key. You could set up a monorepo with this stack:

  • Backend: Node.js with TypeScript and Hono (a modern, fast web framework).

  • Frontend: React (or Vue) with Vite and TypeScript.

  • Testing: Vitest throughout the project for immediate feedback.

2. The Test Workflow (Repeat for Each Model):
For each model (GPT-4o, Claude 4 Sonnet, etc.), you’ll follow the same steps and evaluate how it performs.

Task A: Backend with (Test-Driven Development)
Just as suggested, have the AI write the tests first.

  • Prompt 1: “Write a Vitest test for a CRUD API for todos (Create, Read, Update, Delete). The tests should fail initially since the API doesn’t exist yet.”

  • Prompt 2: “Now, implement the Hono API endpoints to make all the Vitest tests pass.”

Task B: Frontend Implementation

  • Prompt 3: “Create a React component that fetches the todos from the backend and displays them.”

  • Prompt 4: “Extend the component to allow users to add new todos, mark existing ones as complete, and delete them.”

3. The Ultimate Challenge: The 1-Million-Token Test
This is where you can really challenge models like Claude Sonnet in MAX mode.

  • Prompt 5 (with the entire codebase in context): “Analyze the entire project. Refactor the backend to replace the in-memory array with persistent storage using a simple db.json file. Update all API endpoints and tests accordingly.”

This final step will truly reveal which model can grasp the context of an entire project and apply consistent changes across both the frontend and backend.

4. Your Evaluation Scorecard:
At the end, you can fill out a simple scorecard in your .md file for each model:

Criterion GPT-4o Claude 4 Sonnet Sonnet 1M Grok code
Code Correctness :white_check_mark:/:cross_mark: :white_check_mark:/:cross_mark: :white_check_mark:/:cross_mark: :white_check_mark:/:cross_mark:
undefined -— -— -— -—
Efficiency (# Prompts)
undefined -— -— -— -—
Code Quality :star::star::star: :star::star::star::star: :star::star::star: :star::star::star:
undefined -— -— -— -—
Context Awareness :star::star::star: :star::star::star: :star::star::star: :star::star::star:
undefined -— -— -— -—

Your current approach of using Auto mode for general questions and Sonnet (with and without MAX mode) for specific tasks is very smart. This benchmark will give you concrete data on when it’s truly worth switching to a model with a larger context window.

I personally don’t like Opus — to expensive and the results are bad in combination with my rules.

Good luck with your benchmark! This is the best way to make a well-informed decision.

FYI – Not my setup, but not far away:

  • Real-time research/search → Gemini 2.5 Pro
  • Planning & Reasoning → Gemini 2.5 Pro and evaluate with an another model
  • Coding → Claude 4 Sonnet w/ Cursor
  • Write Test Cases → Gemini 2.5 Pro
  • Run Test Cases → Auto Mode
  • Debug → o3 or Auto Mode

Important: Basic tasks like connecting git, vercel, supabase and so on can be done with “Auto-Mode” from my point of view.

Have fun with Cursor!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.