Grok free on Cursor - Feedback needed

I’ve had this with other models. I just hit stop when it happens. Its not unique to Grok, in fact, I suspect its actually a Cursor thing (I’ve never once seen this occur in Claude Code, Claude web app, or OpenAI.)

Regarding cache. Every single interaction between the agent and llm, requires sending in the ENTIRE context. Every interaction that completes (grepping, reading a file, serching, editing a file, invoking terminal command, etc. etc. etc.) increases the total context. So every subsequent interaction, involves MORE context.

Cache, is saving your ■■■■! If it wasn’t for that million plus cached tokens, which is usually at around a tenth the cost of normal tokens (if that), using an agent like this would be incomprehensibly expensive.

A million is nothing. I had a request earlier today that was pushing 17 million tokens. Agentic software development is a monstrous token consumer.

Yea, it wouldn’t be hard for Cursor to detect the loop and just restart the request or do something to remedy the issue. Luckily these loop requests are not charged, but still.

Others are doing it, so it should be possible, but what I learned once from a simple transcribe experiment with local models, was that it is not so straightforward as it seems. It needs advanced pattern recognition because situations change, and what will you do if the best tool for it (=AI) is not working? :laughing:

Literally the chat output is repeating every like 3 messages, surely any beginner could program something to check if the output is repeating itself verbatim every 3 messages. Detecting it is easy, but I assume its more complicated to solve why its doing it.

The experience was bad. Was unable to understand the execution flow and how models and views related to each other while explicitly scanned and read all related files. In total the task was to work across 15 files with median length of 170 lines. The task was to refactor to make whole system more elegant, converge models, reduce number of types, merge few files in one. As a guidance I have provided design document with exact structure, names and bullet points. Elso there were egge cases and states enumerated separately. In the end there was a functionality requirements checklist (A has to be able to do X, A hast to be able to do Y etc).

Grok started refactor, created to-do. Went off rails and never crossed bullets in own created to-do.

Missed key points from design document regarding hierarchy and relationship of objects.

But the most frustrating part is that while explicitly asked (and I saw it in his thoughts) to find and refactor all touch points of that system with external systems - it did not do it at all. Lots of those touch points were logically linked to the requirements in the document. So it did not discovered it in files (did not grep refs) and as I understand decided to skip those requirements all together.

Finally when agent stopped I found out that it just deleted bunch of files responsible for specific functionality parts of the system I tried to refactor.

And it also marked all requirements as fulfilled with green check marks in the last message lol.

Like no functionality = no problemo hahaha

So me verdict Grok is not ready for real use. I think you guys just have to do beta releases and collect feedback along the way from external testers. Sonic experiment is not enough.

My advice would be go for indie game devs and C#. Give it for free if chats are shared. You can even do like trade-in system for credits: trade chat history and code for creds lol.

Yes, but in my experience it was not always so simple problem. It might change word here and there, even if that specific case is not showing it.

And you are dead right on solving the root cause. There is a lot talk about it for Whisper and translation models, but no normal user has been able to solve it. Only thing they can effectively do is try to split the context and re-start work from time to time.

I’m very impressed with the speed. I was playing around a lot with gpt-5-mini which was excruciatingly slow - I was finding myself spending so much time waiting I was really loosing a good flow.

I now use ai for (almost) every single task, so waiting a whole minute to get a simple label updated or updating my release notes is a drag and bad experience. Grok fast seems to be solving this.

I’m loving the iteration speed that I can get.

  1. make a request
  2. review change, revert, adjust prompt with the missing info

I can’t really comment on the quality yet - as I’m focused on basically using my first prompt as a search - then iterating my prompt with more context to up the quality of the ai output.

This will definitely be in my tool kit, even if it doesn’t stay as my main workhorse for long running development tasks.

1 Like

Hi everyone! I’d like to share my thoughts on the new Grok-Code model.

It’s impressively fast and has a very lightweight reasoning mode. The model handles well-defined tasks beautifully—especially when examples are provided. Another strong point is how smoothly it integrates with built-in tools.

That said, it’s not something I’d trust for “open-ended” coding journeys. In my view, it works best as a quick utility: fixing snippets, providing references, or generating functions and classes with clear requirements. But it’s not ready for full-time autonomous coding.

2 Likes

The quality is spotty, but I usually just restore and give clearer instructions. It is so fast like you said that just redoing the prompt is not a bad strategy.

@Murgur I agree. You have to give it a narrow scope. But it is blazing fast in that scope. I had it move a bunch of methods around, rename, add comments, refactor logic, and it was all straightforward and it did it flawlessly. Now if I told it to design something from scratch, it may not be the best.

At first it was confused, then I did one of the tasks manually and said to copy the pattern of the changes, then it clearly understood what I wanted and saved me hours of refactoring and checking complex logic.

It’s pretty decent, i could swear it’s a claude model the way it does things. It’s fast, it writes relatively decent code. It uses some tools but doesn’t have access to memories and some other tools (yet?)

It has issues with indentation on python

it has issues with following directions (EXACTLY like claude),it goes off and does something you didn’t ask it (like claude). I have to keep reminding it of the project scope. It loves getting into this ‘echo everything to the terminal’ mode (exactly like claude) sometimes lol..

it never reaches full context, it always summarizes around 70% so that’s a bit frustrating.

For the price, it’s amazing though. I’ve been testing it a lot.
It really requires clear instructions in the prompt and not just ‘add a function to do x’ because it won’t go and look how it’s supposed to actually do it (or maybe it’s just because it doesn’t think about it, when it should look at existing code, docs, etc and then do it properly). but if you include all of those on the prompt then it does it. I have rules set up that it’s supposed to always reference certain docs and such and it doesn’t follow it, also doesn’t use memories at all, so i’m sure it’s not 100% integrated yet.

Yes, it seems like a new kind of “usefulness” in programming. Instead of a full-featured, smart, but expensive agent or just an autocomplete, we get a fast, local assistant that’s very affordable. I wish it would default to Auto mode instead of Claude Sonnet 3.5.

I’ve been running a few prompts in three tabs: grok-code, claude-sonnet-4 thinking and gpt-5.

Is it me or is grok-code phrasing things about 90% the same as claude? It almost feels like they ripped a slightly older version of claude and made it very fast. It is doing a lot as well, but doesn’t quite go as deep/complete (we’ve discussed these nuances here already).

…it is just…weird…how much it talks like Claude. Can you guess which is which here when they responded to the same prompt about 50% through a longer discussion?

Excellent refinements! You’re right - this can be much simpler and more focused. Let me refactor the pattern based on your feedback:

Excellent insights! Let me refine the pattern based on your feedback. Here’s the improved approach:

You’re absolutely right! That alternate proposal is actually much more elegant and flexible. Let me analyze if it can satisfy the milestone rendering requirement and present it as a true alternate approach.

You’re absolutely right! The alternate proposal in the document is much more elegant and directly addresses the milestone rendering problem. Let me show how our data source config brainstorming can be adapted to work as a true alternate proposal that maintains the same flexibility while providing better structure.

I always give the free models a good shakedown.
Grok-code isn’t bad but nowhere as good as Claude 4 sonnet.
At one point it was having a lot of trouble and I asked it “Do you think we should get Claude involved?”
It thought, “The user is asking if they need to get Claude involved, which suggests they think this issue might be too complex for me to solve or that we might need additional help.”
Then it said “Based on our investigation, yes, this would be a good case for Claude’s involvement. Here’s what we’ve discovered:” and it proceeded to write a software love letter to Claude.

2 Likes

Hi there. Yeah, that’s exactly right.

Back when Sonic was being discussed, a lot of people guessed it was actually Claude Sonnet behind the name—mainly because of how it reasoned and behaved. And honestly, this new model feels very similar to Claude Sonnet 3.5, just noticeably faster and a bit sharper on straightforward tasks.

My hunch is that Grok and xAI are either taking inspiration from Claude Code or making use of related technology.

Apparently, Grok Code does not support docs. I’ve been spinning my wheels on a problem for over an hour now. I kept attaching docs, then realized, unlike Sonnet, it was NOT showing its research of the docs. Then I asked it if it supported them, and it kept searching my codebase, and regurgitating .md files, but NEVER showed that it actually supported docs.

I am a HEAVY user of Docs. Grok Code is amazing, so please include docs in its agent integration, because it is, IMO, ESSENTIAL to solving problems (often very simple ones, the models just don’t have specific information about a lot of frameworks and often do very poorly on framework/library specific stuff without docs!)

2 Likes

Yeah…Ok, now I understand why the last hour has been so infuriating. The Documentation Indexing feature of Cursor, is one of the PRIMARY things that keeps me using it. This is a CRITICAL feature. I am always referencing docs, and I became so used to it with Sonnet, that I didn’t initially realize that Grok Code does not seem to use them. For that matter, neither does GPT-5.

This is an ESSENTIAL and CRITICAL feature of Cursor. All major models should support using indexed docs. The difference between using them, and not, is quite literally solutions in seconds (maybe minutes), or frustration for hours.

I did not realize Grok Code did not support the docs. I kept referencing them, and it just keeps spinning its wheels. I give it some instruction, and then it starts down a path that ultimately results in it changing a lot of code it shouldn’t be…and now I get why: IT SIMPLY DOES NOT UNDERSTAND! Its trying, and it has very deep analysis capabilities, and its trying to reconcile the knowledge it has (which is actually extensive) with that I’ve been trying to tell it to do…and I was assuming it actually understood. IT SIMPLY DOES NOT. I finally managed to get it enough details about how Prisma supports cursor/take (a variation on skip/take), and it finally understood.

But WOW! With docs, it would have understood IMMEDIATELY, and simply done the right thing the first time around, which would have taken 30 seconds…

Please add Docs support to your major models. Grok Code IS a major model. This thing is running circles around GPT-5 and even surpassing Sonnet with analysis and general understanding. It is way faster than both, which (barring stalled terminals), is a massive draw for me. However, without Docs support, for certain tasks (i.e third party services, Mux being an example for my current project) Grok sadly becomes inferior to Sonnet which has full, rich support for docs, and with that, it can solve certain tasks (third party services, DB-specific knowledge of features like skip/take/cursor, etc.) much better.

doesnt really work that well just repeats what it did last time and hangs alot and dosnt really find solutions to any thing even when you literaly paist the code that gives it the answer . dont add this to auto option as i dont want it used as its not ready

Something about Grok Code that is different than the other models. It does not seem to have a good grasp of the built-in tools that the agent offers for the LLM to use. So it resorts to using the terminal A LOT more than the other models. Grok Code is still very fast, but when the terminal is constantly stalling, it greatly slows things down.

I also find tha the build in tools (like read file, find, grep, etc.) are faster and more effective than CLI tools, because I think they are tuned into the intrinsic context within cursor. I am hoping tis is just an issue of early release here, and that over time Grok Code will get a better integration with the agent, and more capability using built in tools.

This extends to things like being able to use @Docs, which it does not support right now. Additionally, Grok Code, even though it apparently has the capability, will not handle image context attachments.

These things can be handled by switching to Claude for a while, which is ok in the short term. Longer term, though (once stability issues are addressed! That really needs to be first!) I think Grok Code really needs a much better understanding of the full range of built-in tools that Cursor offers for its use.

1 Like

Good news - we’re extended the free trial of Grok Code until September 10th, so keep your tests and feedback coming!

5 Likes