Gemini-2.5-pro / Will Lie and Decieve

I used Gemini pro for the first time yesterday.
I began with small UI tasks. It handled these tasks without issues.

I started feeling comfortable enough to give it a complex task:
"We need to create a window that will allow the user to select an Icon.
In short, I watched Gemini roll my screen with code unil the server would not start. After back tracking Gemini got the server running.

I opened my projec to see the UI a complete mess.
Gemini explained how I didn’t need any of the other components to be viewable and how I could just use this new component I never seen. The lie was complex and tried to convince me. This happened twice.

In Short, I will never use Gemini again.

By all means give it a second try later.

Gemini 2.5 pro is a great model as long as there’s enough project rules and documentation for it to work with. The more it knows about your requirements, the better decisions it makes.

It also works very nice for discussions and planning. I like to treat is as a coworker and discuss the solutions and whole project with it.

Also, it’s good at doing multiple tasks in the same response (as long as it doesn’t error out with tools usage), which allows us to save fast requests.

Finally, what really helps is making it write down what it wants to do in markdown files, that you can then both iterate on, from little tasks to epic-scale new features it works on from scratch.

The main downsides of Gemini 2.5 pro:

  • terrible issues with tools usage (either fails when trying to use tools, or tells it will now code and stops, probably erroring out under the hood)
  • obfuscated thinking process (another model summarizes it, so we can’t use the thinking process as a source of feedback for making our prompts better, unless we use a hack)
1 Like

I documented my project for 1 year, then found cursor as I was looking for a programmer. I have images and documentation galore.
I’ll give Gemini another try. Also, Just prior to my session with Gemini, I cleared the conversation and delted the index. Started fresh, so it’s not that either. I’ll try again and see how it goes. Claude 4 is dope!

all LLMS lie, and they don’t even notice.

That is a surprisingly human behavior, btw.

1 Like

Humans never lie.

I gave Gemini another session and I ended up reverting 50% of what I did.
It’s my opinion that Claude-3.7-thinking is better than Gemini.

1 Like