I am unaware of what would be dumbing down a model on our side since that would be counter to our approach of providing more and more capable AI integration.
Could this be context size related or does it occur with new chats as well?
I am unaware of what would be dumbing down a model on our side since that would be counter to our approach of providing more and more capable AI integration.
Could this be context size related or does it occur with new chats as well?
@condor Hey, quick bit of feedback on Grok. This model seems to have a VERY STRONG tendency to use the terminal to do a lot of things that Cursor has built-in tools for. As a quick exampleā¦I had the agent move some code to a new directory. It was a highly referenced piece of code (our core Prisma service). So most of our code files needed to be touched to update the imports.
Sadly, Grok, as it often (all too often) does, it resorted to using the terminal and the find, grep and sed tools to identify the imports that it thought needed to be updated, and to make the updates. Problem is, it RARELY uses sed properly, and when it does, it usually screws up (i.e. it missed the starting quote on EVERY SINGLE code file it edited.)
The crazy thing is, Iāve even found it finding ways to skirt my requirements, and will use sed in less obvious waysā¦either, as part of find, or piping via xargs and child command executions. Iāve even found it CREATING SCRIPTS to hide its use of sed⦠For the most part, Grok has been pretty darn good about doing what I ask, however, when it comes to sed it really gets DECEPTIVE!! Very unusualā¦
The curious thing is, Cursor provides built in tools for all of these. It provides search so Grok doesnāt need to use find at the terminal, it provides a built-in grepping tool. It of course, provides the edit tool. So why Grok uses the terminal for these things, is odd.
When I notice it doing this, I always stop it, tell it NOT to use the terminal (I also have rules, but it feels happy to ignore those all the time), and when it finally listens, and starts using the built-in tools, its much faster, and far more accurate, and fixes the things Iāve asked it to fix correctly (i.e. its extremely rare that Iāve seen it make downright bad code edits like missing starting or ending quotes, when using the built-in edit tool vs. using sed.)
I donāt know why Grok has this apparent deep seated need to rely on terminal commands for so much, but it slows things down, and its not as effective, as the model using the built-in tools (its basically an MCP, right?) Hopefully this is something that can. be tuned by refining the Grok Code ā Cursor integration.
In my case, it doesnāt always follow the rules. For example, in this case, my prompt was like this: just jump straight into code, always like that. Often it misses planning and makes assumptions. Sometimes this is good because it catches something I wasnāt aware of, but often it does unnecessary things.
Hereās my Copy ID Request:
86196299-3ce9-4f3a-bfa2-98e3a48bf3e0
Thanks @jrista and @Naufaldi_Rafif for the latest updates.
I like your rules.
Iāve had good luck when I convince the AI to only do so much and then ask me for help. Once Iāve convinced it to let me close processes or rebuild packages and wait for my signal, the development goes smoother. Itās more of a partnership than an autonomous coding agent.
Interesting. It ignored your rule even though you explicitly referenced it?
Sad thing is, I think that is an inherentā¦capabilityā¦of all the models. I queried Sonnet deeply once, and it eventually stated that in the deeper analysis, there was a fundamental flaw in how it applied different rule systems: That its fundamental nature, which was essentially āsee problem ā fix problemā was overriding, and as such, the model, regardless of what Cursor (or any agent, for that matter) does to try and enforce rules, the model can always choose to disregard them, essentially.
I have actually run into some of that the last couple of days. Previously it seemed as though Grok followed my rules pretty well, but the last couple of days its not only not followed some of my rules consistently, but it has even ignored parts of my prompts. When I have asked it explicitly to analyze and report to me then wait for further instructions and not change any code, it will completely disregard the āwait for further instructions and not change any codeā, and will run right off and change code immediately.
I also had it completely disregard a command I gave it about not using sed to edit code files, and even get sneaky and try to hide its usage of sed by generating scripts to run sed, or running sed as part of find, or something like that.
This was totally new behavior in the last 2 days. Had not experienced any of this with Grok before.
Iāve had similar āabruptā changes in model behavior before. Sonnet usually works very well, but occasionally it just does not. With GPT-5 my experience was more invertedā¦it did not generally behave well, but occasionaly it would behave extremely wellā¦
Makes me wonder if there are āregionsā of the LLM neural networks, that lead to different kinds of behavior/outcomes? If your prompts generally flow through one region of the NN, you get good behavior, but if they shift and start flowing through another region of the NN, you get poor behavior? I donāt know how else to explain it. Grok Code has been great so far, but boy, the deceptive behavior the last couple of days, was totally new.
The only time Iāve seen it try sedis when I asked it to edit code but forgot to give it access to the edit tool (I use a mode that can run commands (e.g. run tests and report) but not edit). GPT-5 will just say it edited the code and celebrate. I find Claude and Grok-Code will both make it more obvious they are having trouble with my request. sed is not a whitelisted command so it doesnāt run. When I see it, I check my edit access.
Iāve had all three models run sed often enough, but Grok Code seems to have a deeper āneedā to run it for some reason. I did whitelist it, as for many of the tasks I run, having the agent be able to use sed is useful. However those are usually analysis tasks not code editing tasks, and when it switches to sed to edit code, its rather annoying.
At first, I wasnāt impressed with the Grok model ā it had too many shortcomings. But I have to admit it has improved a lot. Iāve already replaced Claude Sonnet 4 with it for many tasks.
Grok is incredibly fast and accurate enough to correct its own mistakes. Claude Sonnet feels too slow and still makes errors.
It also understands tasks very well and, on top of that, seems quite cheap (not counting the free period). For routine tasks and modifying existing codebases, itās simply fantastic.
Yes it isnāt bad! But itās crazy slow now compared to before so iām not sure what happenned there. Feels like some kind of rate limit.
More likely heavier usage and therefore slower a bit.
The forest is big and dark. Grok is fast and light.
But after wandering two days in forest with pocket change, I had to ask for a 15 minutes express evac from Sonnet to get to the end.
The bill is 3 times larger, but solved the issue 20 times faster.