Which AI model in Cursor AI is best for coding tasks

Abdelrahman_Mohamed · September 3, 2025, 9:36pm

Hi everyone,
Based on your experience with AI models inside Cursor AI, such as:

GPT-5
GPT-4o
Claude-4 Sonnet
Claude-4 Sonnet 1M (1M tokens)
Claude-3.5 Sonnet
Claude Code
Opus Code
Grok Code Fast

Which model do you find the best for coding tasks?
Are there specific cases where one model works better than others (e.g., large projects, performance optimization, or complex code generation)?

I’m also very interested in hearing your experiences specifically with Claude Code, Opus Code, and Claude Sonnet 1M (the 1 million token model) — how do they perform compared to the others?

And regarding Grok Code Fast, do you think it’s actually good compared to the rest?

sabriguenes · September 3, 2025, 10:47pm

Hi Abdelrahman,

A theoretical comparison can only get you so far. A highly practical approach is to run your own personal benchmark on a real project—that’s the definitive way to discover which AI model works best for your specific workflow.

My proposal / idea of using Vitest for instant feedback and structuring the process in a .md file could be right for your case. Here’s a suggested plan to build on the idea to get the most out of your test:

Focus on a “Full-Stack” Todo App

You can set up a .md file to guide your test.

1. The Project Setup:
A solid foundation is key. You could set up a monorepo with this stack:

Backend: Node.js with TypeScript and Hono (a modern, fast web framework).
Frontend: React (or Vue) with Vite and TypeScript.
Testing: Vitest throughout the project for immediate feedback.

2. The Test Workflow (Repeat for Each Model):
For each model (GPT-4o, Claude 4 Sonnet, etc.), you’ll follow the same steps and evaluate how it performs.

Task A: Backend with (Test-Driven Development)
Just as suggested, have the AI write the tests first.

Prompt 1: “Write a Vitest test for a CRUD API for todos (Create, Read, Update, Delete). The tests should fail initially since the API doesn’t exist yet.”
Prompt 2: “Now, implement the Hono API endpoints to make all the Vitest tests pass.”

Task B: Frontend Implementation

Prompt 3: “Create a React component that fetches the todos from the backend and displays them.”
Prompt 4: “Extend the component to allow users to add new todos, mark existing ones as complete, and delete them.”

3. The Ultimate Challenge: The 1-Million-Token Test
This is where you can really challenge models like Claude Sonnet in MAX mode.

Prompt 5 (with the entire codebase in context): “Analyze the entire project. Refactor the backend to replace the in-memory array with persistent storage using a simple db.json file. Update all API endpoints and tests accordingly.”

This final step will truly reveal which model can grasp the context of an entire project and apply consistent changes across both the frontend and backend.

4. Your Evaluation Scorecard:
At the end, you can fill out a simple scorecard in your .md file for each model:

Criterion	GPT-4o	Claude 4 Sonnet	Sonnet 1M	Grok code
Code Correctness	/	/	/	/
undefined	-—	-—	-—	-—
Efficiency (# Prompts)
undefined	-—	-—	-—	-—
Code Quality
undefined	-—	-—	-—	-—
Context Awareness
undefined	-—	-—	-—	-—

Your current approach of using Auto mode for general questions and Sonnet (with and without MAX mode) for specific tasks is very smart. This benchmark will give you concrete data on when it’s truly worth switching to a model with a larger context window.

I personally don’t like Opus — to expensive and the results are bad in combination with my rules.

Good luck with your benchmark! This is the best way to make a well-informed decision.

FYI – Not my setup, but not far away:

Real-time research/search → Gemini 2.5 Pro
Planning & Reasoning → Gemini 2.5 Pro and evaluate with an another model
Coding → Claude 4 Sonnet w/ Cursor
Write Test Cases → Gemini 2.5 Pro
Run Test Cases → Auto Mode
Debug → o3 or Auto Mode

Important: Basic tasks like connecting git, vercel, supabase and so on can be done with “Auto-Mode” from my point of view.

Have fun with Cursor!

system · December 2, 2025, 10:48pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Which AI Model Should I Use for Programming? Discussions	9	16101	December 10, 2025
Comparing Models Discussions	4	4839	August 11, 2024
Best Coding Model in OCT 2025 - Claude Sonnet 4.5 vs? Discussions	2	2623	January 8, 2026
Gemini 2.5 vs Sonnet 3.7 vs Grok 3 vs GPT-4.1 vs GPT-o3 Discussions	12	10960	April 20, 2025
What's Your Go-To AI Model for Coding and Any Cool Tips? Discussions	7	4292	April 11, 2025

Which AI model in Cursor AI is best for coding tasks

Focus on a “Full-Stack” Todo App

Related topics