New llama 4.0 10 million context

HelloI see several people complaining about context if 2.5 million is not reached and etc notice how the goal solved this

What do you think of these results and with 10 million context tokens it would be excellent for all our larger projects, what do you think?

Let’s vote in the hope that the cursor adds this fantastic template

3 Likes

those 10m are good only for tasks that can be easily accomplished by vector search and knowledge graph. All models start losing context on complex questions after 8k of context. You should not use more than 8k of context if possible.

2 Likes

From initial investigations, this doesn’t look to be much of a breakthrough model. The model performs very well on some benchmarks, but poorly on others, and we don’t believe the context window is genuinely functional, in that if you gave it 10m tokens of context, it wouldn’t have real understanding of all of them!

3 Likes

Substantially not true for Gemini 2.5

Hey bro, a lot of people on social media have been sharing their test feedback on LLaMA 4, and it seems like the response hasn’t been super positive. There’s also a ton of juicy gossip floating around—pretty entertaining stuff. If you’re interested, you should look it up; it’s quite a ride.

I haven’t actually tried or tested LLaMA 4 myself, because based on the general feedback, it doesn’t seem to hold up against the real-world experience of something like Gemini 2.5 Pro Exp. So I probably won’t rush to try it anytime soon. Let’s wait for the dust to settle—public opinion tends to be way more reliable than their own hype anyway.

There is a ‘sweet spot’ with Gemini 2.5 Pro though. I feel like after around the 200k mark, like into posting entire codebases, returns diminish significantly. Using the Google Studio UI via RepoPrompt here.

Yeah, I’d agree with 200K as where you start to notice performance falling off.

It’s also half the per token cost <200K, so definitely better to stick to that where possible.

But having 200K of truly usable context is revolutionary - every other model falls off hard well before that.

Yeah, even though Claude 3.7 can supposedly take 250k, the drop off of sensibility in output after 75K is tangible.

1 Like