(Continuously Updated) My Real-Time Review of Grok 4

Gemini always has a long thinking phase at the first prompt in the dialogue. I assume that he plans better if the necessary context is passed to him with the first prompt, rather than through subsequent read_tool. Grok 4 can generally act without context.

1 Like

Yes it does depend on the model as well.

Keep in mind there’s always context when using Cursor, that’s why @condor can one-shot without giving context, Cursor adds file structure(if ticked in settings), git graph(if ticked), context related to the prompt through Vector search on your codebase index, your mcp tool descriptions and system prompt depending on the model, […], then the model decides if it’s enough or to keep searching(using tools), when indexing, gitignore, cursorindexignore, file structure and file length are good then you can expect @condor results

1 Like

Yeah, I was referring specifically to specific files as a context. I am too lazy to test it intentionally, but Gemini thinks more and makes a more purposeful first edits if you feed it manually.

1 Like

Let me add some insights, here’s two benchmarks I really like that could explain your results: https://livebench.ai shows Grok-4 as the highest SOTA model in reasoning and the lowest in agentic coding, in https://contextarena.ai we can see Grok-4 stays 1st place in understanding context until 32k tokens(800-1600 lines, similar to Claude-4-Thinking), then Gemini models take the 1st place, Grok-4 should be used under 1600 lines for planning or Gemini-2.5-pro if over, then Claude-4-thinking to code the plan because of its superior integration into Cursor and tool calling but only needed if you need to reach a bunch of files(or part of files) out of the initial context

1 Like

It seems to me that Grok 4 is now the absolute leader and its only problem in Cursor is Cursor.

3 Likes

Well, the absolute leader in reasoning, to code we need agentic coding and it’s the worst between SOTA models, it needs a lot of work for tool use, but they’re pretty active on it:

2 Likes

Any update here, is Grok 4 still bad in Cursor?

1 Like

@ovitrif we have solved several issues and launched updates over last 2 weeks. Do you still have issues?

If yes, could you post a Request ID with privacy disabled so we can look into the details? Cursor – Getting a Request ID

1 Like

Hi @condor! Went back to using Claude 4 Sonnet and didn’t try again. Seems to be working now, thanks!

1 Like

In case anyone is wondering, Grok 4 is still super buggy. It absolutely WILL do either infinite loops of reading the same file 100 times, or, think for 20 minutes, but make one code edit, that doesnt compile or help with anything.
Same with their “Stealth” Sonic model that always says “xai artifact” in their thinking trace.
XAI models and Cursor do not work.

1 Like

What about the grok-code-fast-1, which is the public non-stealth version of Sonic?!

Does it make it worth my while returning to Cursor, or I can keep using claude code for the time being?

1 Like

It’s not as good as Claude 4 generally. But it works better than Grok 4. There are some cases where it’s more useful, during certain types of iterative development, due to it’s speed. It’s like Claude 3.5 sonnet level.

It is Free as of now–it’s worth it when it’s free, which says something.

1 Like