(Continuously Updated) My Real-Time Review of Grok 4

Artemonim · July 16, 2025, 9:09am

Gemini always has a long thinking phase at the first prompt in the dialogue. I assume that he plans better if the necessary context is passed to him with the first prompt, rather than through subsequent read_tool. Grok 4 can generally act without context.

condor · July 16, 2025, 9:09am

Yes it does depend on the model as well.

normalnormie · July 16, 2025, 9:22am

Keep in mind there’s always context when using Cursor, that’s why @condor can one-shot without giving context, Cursor adds file structure(if ticked in settings), git graph(if ticked), context related to the prompt through Vector search on your codebase index, your mcp tool descriptions and system prompt depending on the model, […], then the model decides if it’s enough or to keep searching(using tools), when indexing, gitignore, cursorindexignore, file structure and file length are good then you can expect @condor results

Artemonim · July 16, 2025, 9:31am

Yeah, I was referring specifically to specific files as a context. I am too lazy to test it intentionally, but Gemini thinks more and makes a more purposeful first edits if you feed it manually.

normalnormie · July 16, 2025, 9:49am

Let me add some insights, here’s two benchmarks I really like that could explain your results: https://livebench.ai shows Grok-4 as the highest SOTA model in reasoning and the lowest in agentic coding, in https://contextarena.ai we can see Grok-4 stays 1st place in understanding context until 32k tokens(800-1600 lines, similar to Claude-4-Thinking), then Gemini models take the 1st place, Grok-4 should be used under 1600 lines for planning or Gemini-2.5-pro if over, then Claude-4-thinking to code the plan because of its superior integration into Cursor and tool calling but only needed if you need to reach a bunch of files(or part of files) out of the initial context

Artemonim · July 16, 2025, 10:04am

It seems to me that Grok 4 is now the absolute leader and its only problem in Cursor is Cursor.

normalnormie · July 16, 2025, 10:12am

Well, the absolute leader in reasoning, to code we need agentic coding and it’s the worst between SOTA models, it needs a lot of work for tool use, but they’re pretty active on it:

ovitrif · August 1, 2025, 10:57pm

Any update here, is Grok 4 still bad in Cursor?

condor · August 2, 2025, 9:10am

@ovitrif we have solved several issues and launched updates over last 2 weeks. Do you still have issues?

If yes, could you post a Request ID with privacy disabled so we can look into the details? Cursor – Getting a Request ID

ovitrif · August 3, 2025, 2:49pm

Hi @condor! Went back to using Claude 4 Sonnet and didn’t try again. Seems to be working now, thanks!

turtle260 · August 23, 2025, 3:45pm

In case anyone is wondering, Grok 4 is still super buggy. It absolutely WILL do either infinite loops of reading the same file 100 times, or, think for 20 minutes, but make one code edit, that doesnt compile or help with anything.
Same with their “Stealth” Sonic model that always says “xai artifact” in their thinking trace.
XAI models and Cursor do not work.

ovitrif · September 16, 2025, 9:18am

What about the grok-code-fast-1, which is the public non-stealth version of Sonic?!

Does it make it worth my while returning to Cursor, or I can keep using claude code for the time being?

turtle260 · September 26, 2025, 4:40pm

It’s not as good as Claude 4 generally. But it works better than Grok 4. There are some cases where it’s more useful, during certain types of iterative development, due to it’s speed. It’s like Claude 3.5 sonnet level.

It is Free as of now–it’s worth it when it’s free, which says something.

Topic		Replies	Views
Grok free on Cursor - Feedback needed Discussions	71	5655	September 24, 2025
`sonic` Ghost Model Discussion Release Discussions	75	7137	August 30, 2025
Gemini 3.0 Pro - Out Now! Release Discussions	113	15988	December 6, 2025
How To Optimize Your Usage: The Best AI Models to Use, version 2.2 Guides	28	4163	December 9, 2025
Is it just me, or is GPT-5's logic for code incredible? Discussions	27	9552	September 28, 2025

(Continuously Updated) My Real-Time Review of Grok 4

Related topics