those 10m are good only for tasks that can be easily accomplished by vector search and knowledge graph. All models start losing context on complex questions after 8k of context. You should not use more than 8k of context if possible.
From initial investigations, this doesn’t look to be much of a breakthrough model. The model performs very well on some benchmarks, but poorly on others, and we don’t believe the context window is genuinely functional, in that if you gave it 10m tokens of context, it wouldn’t have real understanding of all of them!
Hey bro, a lot of people on social media have been sharing their test feedback on LLaMA 4, and it seems like the response hasn’t been super positive. There’s also a ton of juicy gossip floating around—pretty entertaining stuff. If you’re interested, you should look it up; it’s quite a ride.
I haven’t actually tried or tested LLaMA 4 myself, because based on the general feedback, it doesn’t seem to hold up against the real-world experience of something like Gemini 2.5 Pro Exp. So I probably won’t rush to try it anytime soon. Let’s wait for the dust to settle—public opinion tends to be way more reliable than their own hype anyway.
There is a ‘sweet spot’ with Gemini 2.5 Pro though. I feel like after around the 200k mark, like into posting entire codebases, returns diminish significantly. Using the Google Studio UI via RepoPrompt here.