Grok free on Cursor - Feedback needed

I’ve missed the thread, let me reinstate what i’ve written.

It has ingested Ketamine, lots of digital Ketamine.

the LLM is fast, but in my limited case (Php) i haven’t seen a real advantage.

I’ve seen it tinkering on the existing code at light speed (in agent mode), making the same mistakes for half an hour.

And multiple times it asserted that the problem was fixed, when it was not fixed at all: it just simply made some tests and those tasks were succesfully failed, like the windows meme.

And since i have a repository of the project all the changes were tracked.

So, i had to fall back to Gpt5-low-fast which fixed the issues in 5 minutes.

I don’t know, maybe it’s better on python and javascript… i’ll have to try.

2 Likes

It definitely can cause things to get out of control fast. It can be a hit or miss, but I often just try again with a clearer prompt or new chat or worst case, manually do a change and tell it to follow the pattern of the change I made for all the other instances.

1 Like

That is good news, Thanks!

Overall, I have generally found Grok Code to be the superior model right now. It solves the long annoying think times of GPT-5, has the smarts and depth of knowledge that Sonnet has (and sometimes GPT-5, although I think GPT-5’s excessive reasoning cycles actually work against it at times), and has more breadth of knowledge I think.

One of the BIGGEST bonuses, I think, with Grok Code is its SPEED. Its fast in general, and while it is a thinking model, it doesn’t waste the developers time with excessive thinking, doesn’t seem to be trying to pack as many tokens into the output via reasoning as it can (like GPT-5, it seems OVERT!) and it just gets the work done.

I think the current single largest shortfall of Grok Code, is lack of @Docs support. I use that so much, I didn’t realize how much until Grok Code couldn’t use it. Now I realize it is an utterly critical feature. Web searches via @Web are just no the same and not nearly as effective, either. As far as I understand, the docs are effectively designed like an MCP? So I wouild think, it should be possible to integrate support into Grok Code?

After that, when it comes to web app/mobile app development, the most common form of debugging I use is to drop both error messages AND screenshots into the chat and let the agent work on them. Grok Code does not seem to handle images, which mitigates its value there.

Otherwise, I think most of the other iseus I’ve been encountering, may not be Grok. Single largest factor hampering my usage lately has been connectivity instability. ITs all over the place in Cursor right now, and I can’t tell what is actually causing it.

yes its fast, but accuracy is lucking for me! i just don’t have the confidence looking away while it works like i do with other models like sonnet

I have been using this primarily throughout the trial. There have been times I’ve resorted to Claude or kimi-k2 for some things. This is not a “shot” at Grok, by any means; my normal workflow includes switching models throughout the day. Best tool for the job kinda-thing?

I cannot say much more than what I have seen already said. However, I can reinforce the specifics that really stand out to me though:

ANNOYANCES

Writing code without approval
I am in a habit of (most times) to finish a prompt with “repeat your understanding back to me and ask clarifying questions before we make a final plan. do not write any code until your plan has been fully discussed and approved.”

FAVORITES

  1. When I know I will need some overly complex solution or that I will be building out a larger and larger project with that codebase, its pretty good at taking a nice detailed PRD and getting close.
  2. It gives decent summaries at the end of a request.
  3. It doesn’t seem to make the same boneheaded redundant mistakes when interacting with the terminal, compared to Claude or GPT5. This could be a hallucination of my own though.

NOTABLE

The slop it brings to the cursor chat is not ideal.

TO CURSOR TEAM

Your product is suitable to my workflow and to me is superior in its class. If you continue to showcase free models like this, as you did also with GPT5, I can actually justify the overage fees after jacking up the pricing model.

MY PLEA

I beg you to enhance functionality (paid only perhaps?) to allow me to use OpenRouter or or my own locally hosted LLM.

I work in a heavily restrictive network and Local LLMs are OK, but I cannot get out to any hosted LLMs. My dev box has no prod access. My prod box can actually run LLM decent enough to handle basic fixes etc, so i dont have to juggle and sneakernet files all over the place.

Keep the free usage models coming!

Quick question.. During the Grok free trial, do calls to the grok-code-fast-1 model not count towards your quota?

They do not.

1 Like

So far it has been by far the best model I have used.
It feels very like Claude - almost like its a distillation of Claude 4 but it is less prone to sucking its own d*k on everything (which is great for my sanity)

Theres nothing worse than a model completely failing to do what was asked and you get:

:white_check_mark::white_check_mark: :white_check_mark: :rocket: 1000% COMPLETE AMAZING SUCCESS :white_check_mark::white_check_mark: :white_check_mark: :rocket:

So far - its fast, its accurate and it doesnt go over the top with changes across the entire codebase - it seems quite focused.

I will keep testing it but so far it has continued to behave exactly as I would expect.
I just hope it doesnt go through the same downgrade that I see around 10pm Australia time on Claude models where it basically becomes useless.

Also with Gemini being completely useless in Cursor lately, this might be the only model thats worthy of use

Its a bit hit and miss when the context gets above 100k I have noticed now.

I dont know whether it is just this time of night for me but every models seems to get ■■■■ around this time. They stop following instructions, they ignore requirements - I really dont know what is happening but its annoying as all hell.

1 Like

@the1dv if you can provide a Request ID with privacy disabled we will be able to look and see what we can improve there.

I have one additional ‘grievance’ with this model worth mentioning.

It is nearly IMPOSSIBLE to get an agent to present you with a plan and ask clarifying questions. It simply just starts coding.

I have a very low success rate. Once in a while I can get coax it, otherwise i need to flip it over to ASK instead of AGENT.

Any good advice to get around that?

Grok isn’t half bad but it still gives the “Perfect!” response when it completely missed a whole bunch of errors. It also doesn’t follow rules well. Also, with Grok being free right now, it’s easy to load it up without worrying about costs.

Personally, I like the optimistic view that Claude outputs. It makes coding a lot more enjoyable if not a bit frustrating too. The other thing I like is Claude talks to you while it’s working and in a more detailed way.

2 Likes

My feedback as more of a niche user :smile: I’m working on C# projects. Most models are somewhat weak for this - I assume there isn’t much focus on training C# by other models but GrokCode is doing very well. I have had it create minimal mocks of some obsolete (>10yr old) libraries to refactor an old project for a new deployment. It also has done well refactoring some complex physics code - it was able to understand a fairly complex constraint solver I had written. Only GPT-5 was able to understand it too (I tried several models to compare outputs). Claude 3.5 had a fair attempt at it but the newer Claude models all failed to understand it while also trying to generate huge amounts of irrelevant or excessive code.
All in all I have tried a pretty wide variety of these models and this one is very fast and generates decent responses. As an experienced dev - I use AI as a “pair programmer” most often, not full Vibe Coding but I feel I rarely have to revisit the grok code responses.
GPT-5 is my previous favorite but it’s FAR too expensive for me so I don’t use it. I use Claude 3.5 as “daily driver”. The newer claude models try too hard or stray off topic - requiring me to “over prompt” which I find annoying. (yes I use PRDs, todo lists, etc.)

Other feedback:

  • As others noticed, it tends handle To Do lists oddly. For me I think it marks things “done” and then tries to solve it - which doesn’t always work out if it is interrupted or loses its place :slight_smile:
  • It also seems to think→change→think→change a lot more than other models.. Appearing to take small bites of the problem as it goes? Not sure how this affects billing or usage limits but it is anecdotally prone to cycle back and forth more than other models.
  • As someone also noted - tool calls will fail randomly (hey cursor - fix this!) - and grok code will say “Perfect!” - which tends to snowball errors. Related : I have a rule/memory to use ‘dotnet build .sln’ after large changes (the C# linter has problems). Sometimes it will try to build 4 or 5 times during a response… with clear errors in the output but it responds “Perfect!” and then stops fixing that particular issue leading to confusion.
  • Looking at my dashboard it will often use far more tokens? But it’s much cheaper so I don’t know that I care about that.

Overall : I’d happily switch to this model as my main one instead of Claude 3.5 - it’s much much faster and “feels” better at C# - thanks for the free trial!

1 Like

It feels very much like Claude - right down to the same self assuredness even though it is completely wrong and actually today is the first time that Claude has outperformed it.

I find it really hard to believe that these models just suddenly change capabilities, the first day of using it was incredible and now its down at Claude 4 level again :frowning:

2 Likes

@mylegitches could you explain more detailed what you are trying to achieve?

The solution may be just a change in prompt potentially.

I’ve been using grok-code-fast-1 for my React frontend and node backend projects. I love the speed, and easily got addicted to that. It’s a great model for precise edits where you know exactly what needs to be done and you can articulate that.

Ultimately I’ve moved back to Auto for most in depth tasks as it takes better initiative and a thorough job, where as Grok (speed aside) feels like it’s rushing to get the job done as quickly as possible but not as thoroughly.

It probably just needs better prompting on my part, and I’ll admit I’ve got lazy because Auto and Sonnet models take good initiative.

edit: also this grok model doesn’t understand ask mode properly, it always tries to make edits.

1 Like

Seems good to me, goes above and beyond I give it sidequests using grep to change text obliges, useful as using logic, good quality coding agent, and its fast!

Agree with the feedback that grok is overeager. I have specific instructions for it to follow when completing tasks, including taking tasks one by one, and it will take multiple tasks and complete them before my input.

There is also a bug with “files edited” section at the bottom of the chat window where it will not show all the changes. This also means I cannot see the diffs in individual files, making it impossible for me to validate the changes.

I haven’t seen it go off the rails yet in implementation, and I have not tried it on anything overly complex.

Love the speed and the price. If it could follow rules more stringently and integrate nicely with the cursor agent I’d make it my daily driver.

Update: when I added my instructions explicitly to Project Rules grok did a much better job at following them than when I just @-mentioned them in the prompt. It is still eager and completed all the tasks, but a huge improvement.

I have no idea if this is just Claude named something different or if this is something on Cursors side but there is an extremely obvious pattern that happens when the models get “dumbed down” - they start re-writing their own previous code - this has happened to me for the first time today - Grok code is just as irritating as Claude about congratulating itself when it hasnt done things correctly