Gemini 3 Flash has become absolute garbage

Gemini 3 Flash has become absolute garbage. Google is treating people like fools.

Hey, I see the screenshot. It looks like the model literally followed the request to repeat the rule 10 times, and then did a broad search across **/*.

Can you clarify what exactly changed? For example:

  • What task were you trying to do, and what did the model do before vs now?
  • About when did you notice the regression?
  • What Cursor version are you on?

As a workaround, try switching to a different model Auto, Claude Sonnet, or Gemini 3 Pro and compare the result.

If you can share a specific case with repro steps, it’ll be easier to tell whether this is a model-side bug or something in your rules setup.

I’m using the latest version of Cursor. I usually ask the AI to explore my project at the beginning of a conversation to understand the basics (to prepare sufficient context). However, I’ve noticed that the AI often directly uses */ to perform a global search in the root directory, which is very time-consuming (and may also consume excessive tokens).

So I added rules to avoid global search in my rule settings. But here’s the key issue: Gemini 3 flash completely ignores my rules. I’ve tried adjusting the rules many times, but it hasn’t helped at all.

Then I did a further test: I first asked the AI to repeat my rules 10 times, and as you can see from the screenshot, Gemini 3 flash still ended up using */ for the search.

Thanks for the details - the picture is much clearer now.

Gemini 3 Flash really does follow rules worse than some other models - it’s a known thing. Models differ in how strictly they stick to instructions, and Flash-class models are generally weaker at instruction following.

A couple things to help figure this out:

  1. Can you share your rules file content? I’m curious about the format - are you using .cursor/rules/ with alwaysApply: true, or global rules in settings?

  2. Try switching to Claude Sonnet or Gemini 3 Pro - they follow rules much better. If the issue doesn’t happen with them, it’ll confirm it’s the Flash model itself.

  3. The Request ID from the chat where Flash ignored the rules would also help (right upper corner of the chat > Copy Request ID).

Let me know how it goes with the other models.

Here are my rules (global user rules):

1. Communicate with the user in Chinese.
2. Prefer using Cursor’s built-in tools (such as Glob, Grep, Read, SemanticSearch, etc.), and do not use shell commands like ls as a substitute.
3. Do not use wildcards like * or **/* to perform full-project searches in the project root directory.
4. Regardless of whether Agent mode is enabled, code edits must only begin after receiving explicit instructions from the user. Do not act on your own.

I’ve noticed that Gemini 3 Flash doesn’t ignore all of the rules. For my rules, it often fails to follow rules 2 and 3, but follows rules 1 and 4 quite well—sometimes it even feels like it follows them better than Gemini 3.1 Pro.

I suspect this may be related to the system prompt as well. Rules 2 and 3 involve Cursor’s built-in tools, which are introduced in the system prompt. If the system prompt isn’t clear or comprehensive enough, the model may form its own assumptions. For example, I’ve noticed that Gemini 3 Flash tends to favor using command-line tools, but for some reason, certain commands (like ls) sometimes return empty results. This leads to a pattern where Gemini 3 Flash repeatedly tries ls several times, fails, and only then starts considering tools like Glob and Grep.

I think that if the system prompt explicitly explained which built-in tools (like Glob, Grep) correspond to which command-line operations, the model might avoid this inefficient behavior from the start.

So I suspect that Gemini 3 Flash’s rule-following behavior is not only influenced by the rule prompt itself, but also by the system prompt, how the rules are embedded, and other factors.

Switching to a more powerful model is of course always better, but also more expensive. If we can get smaller models to perform better, then larger models will likely benefit from the same improvements as well.


Really helpful analysis, especially the note about the difference between rules 1/4 and 2/3. The fact that Flash follows the “simple” rules well (language, confirmation) but struggles with tool-related rules is an important detail.

Your hypothesis about the system prompt makes sense. If the system prompt doesn’t describe the built-in tools clearly enough, Flash-class models tend to fill in the gaps and fall back to familiar shell commands. We know Glob sometimes returned empty results, which could trigger the loop of tried Glob, got nothing, switched to ls. We already fixed that, but Flash might still hit this pattern more often.

One request, can you still share the Request ID from the chat where Flash ignored rules 2/3? (top right of the chat > Copy Request ID). This will let us see what the model actually “sees” in the system prompt and how it processes the rules.

agree it sucks… eve sonnet 3.7 is better