(Continuously Updated) My Real-Time Review of Grok 4

Artemonim · July 9, 2025, 6:26pm

I’ve been trying to develop a CAT tool for personal use for almost a week now. I wrote an incremental integration test with 48 tasks to track progress and follow a TDD-style development workflow.

I’ve hit a wall at Task 20 — none of the available LLMs have managed to pass it so far. I’ve tried everything, both individually and in relay races: Gemini 2.5 Pro, Gemini 2.5 Pro MAX, Claude 4, Claude 4 MAX, o4-mini, o3, o3-pro (Manual mode one-time situation analysis), and Auto. Auto and o4-mini break previously working logic right out of the gate. o3 messes up the terminal with commands I don’t even recognize.

The project is private, so I can’t share anything more detailed, but I’ll be over the moon if my magic wand gets released tomorrow night.

Artemonim · July 9, 2025, 10:30pm

Well… Maybe it’s time to stop. For yet

tb52ta · July 9, 2025, 10:48pm

[MOD - Condor - removed due to Forum Guideline violation]

Artemonim · July 10, 2025, 5:30am

One more request and I just know I’ll fix it!

tb52ta · July 10, 2025, 7:16am

[MOD - Condor - removed due to Forum Guideline violation]

Artemonim · July 10, 2025, 8:04am

Well, I think I have Grok-ready

Ding · July 10, 2025, 8:36am

I’m sorry to tell you the bad news that he will output "Thinking… Thinking… Thinking… Thinking… Thinking… " and then end the conversation.

Artemonim · July 10, 2025, 8:44am

Noway! He’s tripping over edit_tool!

Artemonim · July 10, 2025, 8:47am

Bart Simpsons vibes

Artemonim · July 10, 2025, 8:58am

Well

In regular mode it stops because of edit_tool
Then I corrected the promt and It’s spend 1.24$ because of incorrect tabs editing

Trying in MAX mode with more context…

Artemonim · July 10, 2025, 9:09am

And after ~10 minutes and 0.96$ at Grok 4 MAX we have a regression

Artemonim · July 10, 2025, 9:33am

Same prompt and codebase.
Gemini 2.5 Pro MAX after 15 minutes and 0.8$ and he’s still working

UPD: surrender after 25 minutes and 1.25$
UPD2: after 25 minutes, the number of passed tests increased to 446

leoing · July 10, 2025, 12:59pm

Interesting. And to be expected.

Cursor + Model creator usually need to collab on agentic behaviour.

overlord · July 10, 2025, 2:41pm

Cursor is SO broken with Grok 4. So bad. The model just doesn’t think.

danperks · July 10, 2025, 9:55pm

Hey, as with every major model release, we are working to improve Grok 4 currently, both on available capacity from xAI and on it’s stability within Cursor. Each model requires some custom tuning of it’s system prompt to ensure it behaves well inside of Cursor - they very rarely can be “dropped in” and work immediately.

Artemonim · July 10, 2025, 10:06pm

By the way: when models can’t correctly apply changes via edit_tool or think they can’t, is this the problem of the applying model or the LLM itself, or both?

lolshadowban · July 10, 2025, 11:04pm

Grok has never been good, 4 is no better.

lolshadowban · July 10, 2025, 11:05pm

Gemini fails often.

Artemonim · July 11, 2025, 12:57am

Just over a week ago, I didn’t even know how to use tests.

Now I’m spending hours optimizing a four-layered parameterized integration test for idempotency, trying to get it to run in less than 11 minutes
(and that’s with only 47% of the test currently executing).

Artemonim · July 11, 2025, 1:13am

You complain and only complain about absolutely every LLM. Either your tasks are too complicated, or you have problems with prompt-engineering. Try using my Agent Compass. It will be interesting to see if it can help you.

Topic		Replies	Views
Grok free on Cursor - Feedback needed Discussions	71	5442	September 24, 2025
`sonic` Ghost Model Discussion Release Discussions	75	6743	August 30, 2025
Gemini 3.0 Pro - Out Now! Release Discussions	113	15273	December 6, 2025
How To Optimize Your Usage: The Best AI Models to Use, version 2.2 Guides	28	3794	December 9, 2025
Is it just me, or is GPT-5's logic for code incredible? Discussions	27	9261	September 28, 2025

(Continuously Updated) My Real-Time Review of Grok 4

Related topics