Grok 4 Completely unreliable

Describe the Bug

Grok doesn’t execute prompts correctly, in many scenarios where just about any other model performs normally. Example, it will run one irrelevant command in response to a complex query and then just finish, without saying anything, despite being told to, despite any other model performing just fine. This is unanimous across simple, to medium, to complex prompts. I’ve never seen a model act like this in Cursor…

This happens about 3/4 usages.

Steps to Reproduce

Use Grok 4 model, in a multi-step chat. Model performance is unreliable. It will do fine on one project, with a simple prompt, and the next, just runs “pip install my_reqs other_req” and then finish–despite it being told to fix a bug. Replacing with Grok 3 fixes it, or Claude 4, or o3, really anything.

Expected Behavior

The model performs reliably.

Operating System

Linux

Current Cursor Version (Menu → About Cursor → Copy)

Version: 1.2.4
VSCode Version: 1.99.3
Commit: a8e95743c5268be73767c46944a71f4465d05c90
Date: 2025-07-10T16:59:43.242Z
Electron: 34.5.1
Chromium: 132.0.6834.210
Node.js: 20.19.0
V8: 13.2.152.41-electron.0
OS: Linux x64 6.15.5-arch1-1

Additional Information

This didn’t seem to occur when I first tested it. Replicating the same prompts again, just to test, resulted in it not working. No memories were saved–nothing persistent that I can see, or change.

Does this stop you from using Cursor

No - Cursor works, but with this issue

1 Like