The efficacy of Grok Code?

I’ve worked heavily with three of the major models available in cursor now: GPT-5, Claude Sonnet, and Grok Code. I’m a fan of Sonnet, however, I’m a total groupie of Grok Code. Sonnet does a good job, is relatively quick (note, that’s different than fast, I’ll get into it in a moment). Grok Code, though, is just superior (again, in a moment.)

GPT-5 is an absolute disaster…a CASH GRAB disaster, IMO. It works ok, it is not superb at everything, its not terrible at everything, it has plenty of issues but it has its bonuses. I don’t generally like using it, though, as it is super fussy, but the thing that is most infuriating, is the SHEER UNHOLY amount of time it spends “thinking.” I have wasted more time watching GPT-5 “think” than ANYTHING now. Drives me crazy. Why do I call it a cash grab? I think OpenAI is ■■■■■■■■ people over with all the excessive thinking, as it PACKS tokens into the output very quickly, to no real beneficial end. I think OpenAI is hoping people don’t notice that their thinking effectively produces jack squat on most tasks, and may only provide any truly useful benefit for particularly challenging problems (beyond what the vast majority of vibe coders will need.) I personally think its a cash grab, and in fact, a study done by Anthropic demonstrated that excessive reasoning time and cycles by LLMs usually reduces the quality of the results. Shorter reasoning time up to a certain point produces better results. IMO, GPT-5 demonstrates this quite well. Its the least capable model, in my opinion, even when there are STRINGENT and powerful rules wrapped around the agent to ensure it and the LLM are doing the right things.

I think the thinking time factor is very interesting. I’ve found Sonnet and Grok Code are both superior when it comes to the vast majority of tasks, from the simple and mundane, to fairly complex. They both use short reasoning cycles. IMO that does seem to make them more effective…I have much fewer troubles getting either model to do what I want, than GPT-5. (I think there is more to it with GPT-5 overall though..its “fussy” and that is more than just its thinking time.)

While I prefer either Sonnet or Grok Code over GPT-5, I prefer Grok Code overall, because I see a significant potential for real efficiency and cost savings with it. GPT-5 and Sonnet are both…chatty. Verbose. They produce not just code output, but also a certain amount of chatty verbosity. All that text content, between stints of working on code, documentation, whatever, uses up tokens. Not sure which is worse, GPT-5 or Sonnet. In this case GPT-5 might be a little better (EXCEPT for its excessive thinking, which when accounted for, Sonnet is much better here!) However, Grok Code? No wasteful chattiness at all! Short, FAST thinking cycles, and just the work. Until the end, when it will produce a report about the work it performed. Even then, Grok Code is pretty concise, Sonnet is rather verbose, GPT-5 sometimes is and sometimes is not (but again, MONSTER thinking cycles just burn up the token cost!)

I think this approach that Grok Code takes, is very, very intriguing. Its a FAST model. So you can get a lot done, in a short time, with that 2M tpm output token rate. Significantly faster than the normal GPT-5 models or Claude models. The GPT-5 fast models are faster, certainly faster than Sonnet, but I am not sure if they are as fast as Grok Code (Grok just SCREAMS!) So, Grok is not a cheap model…it is not the most expensive. However, that is just on a raw MTok basis. Given the nature of Grok Code being not as chatty, using short thought cycles (usually with minimal output tokens…compare to GPT-5 which produces a MONSTROUS amount of output tokens!!), and just no waste between cycles of work effort… I think this could make Grok Code one of the most efficient models around for coding. It just doesn’t seem to waste tokens on meaningless cruft. IF I need to know more about why it is doing something, I can always click into its thought cycles and read the details, and I have on occasion needed to do that, and gleaned useful information. With Sonnet, I can clearly see the duplication…the details in the thought cycle, are often repeated, sometimes nearly word-for-word, or lightly summarized, when it outputs its text to you the user. I think Grok Code provides a very intriguing, and welcome, optimization here: don’t waste tokens on meaningless cruft!

So, I’m going to be using Grok Code here, for the next couple of weeks, and will compare to my cost and usage of the other two models. I suspect it will end up being cheaper for any given amount of tokens in total. Interestingly, since first starting to use Grok Code a couple of days ago, I have already surpassed my usage of either claude-4-sonnet model (both are around 350Mtok) and am just about to surpass the GPT-5 usage (about 500Mtok). Its only been a couple of days, though!

I think a lot of that, is the sheer performance the 2M tpm output rate, combined with the less chatty interface and just focusing on getting the work done, and you can rack up some serious usage in a much shorter time than with the other models. You get from point A to B much faster with Grok, at least IME. GPT-5 fast is faster than Sonnet, but none of them approach the sheer speed of getting REAL CODING WORK done that Grok Code delivers. So I’m quite looking forward to seeing: Is this the most cost effective model yet?? :crossed_fingers:

1 Like

If Grok code stays at this price, gpt-5 will have to come down to compete. Just way too much value with grok-code. I still use gpt-5 when something doesn’t work work with grok, but its a lot less.

I am seeing that Grok-code is superior to Gpt-5 with some tasks, even complex ones, and much faster.

Yeah, I hope this starts a price war, and maybe force GPT-5 to stop wasting time and tokens on their ridiculously long and overdone “reasoning” cycles.

The cost of tokens needs to drop as efficiency increases. I suspect Grok Code is running on Collosus, which currently is probably providing higher efficiency than even OpenAI. That is the direction LLMs need to go: towards efficiency. Hopefully that will mean more inbestment in quantum processing, but even just better data centers running more efficient hardware, should help bring token costs down (and increase speed…Grok Code’s speed is really what keeps me on it! SO FAST!)

Grok Code offers other efficiency improvements though, as I mentioned. It just, doesn’t waste as much on intermediate chattiness. It does the work, then reports. I like that.