GPT has a Pinocchio problem

I used Cursor extensively until about 5 months ago and it used to work flawlessly. I started using it again and noticed that GPT models (especially 4/5) would lie extensively and yet produce a nice happy summary with emojis saying everything was “perfect”

Despite careful prompting, (e.g. no mock/filler/placeholder/dummy code) it would still do the exact opposite and create code with so many placeholders or fake mock data and only when i managed to catch it out would it apologise for lying. It’s definitely worse than the experience i had 5 months ago. Anyone experience this?

I’ve had great experiences so far with GPT-5. Feels cheap and fast. 1-2 requests per round (500 requests legacy pricing)

Fast, Cheap, Good. Choose Two.

I had that experience a while ago on a specific codebase. Despite my best efforts with the prompt—“do not generate code to make it pass, no dummy, fake, stub, or placeholder code”—I kept going on and on and on.

But it didn’t matter. I used capital case, added it as project rules, memory entries, user rules, direct prompts. I even asked it: if you find dummy code, fix it instead of using it as a reference. Still, Cursor kept creating fake code just to make the tests pass, and producing a nice summary (I was using Claude Sonnet, Opus or not).

The solution? I had to manually remove all stub/fake/to-make-the-test-pass code. After that, the AI stopped outputting fake code entirely. So my guess is: the existing patterns in the source code carry a lot of weight—so much that the AI gets lost, forgets the prompt, and just follows what’s already there.

So, try removing all the fake code from your project. Hopefully, you’ll get your happy ending like I did.

This is it. Though I’ve found you can add a rule that basically says “We inherited this old codebase that has a lot of sloppy code, our job is to fix everything we touch and do it right this time” and that will help!

1 Like
1 Like

gpt5 has strong rule execution capabilities. Please check if your rules and memory files mention “demo”. I once encountered a similar issue. In fact, it thought it was a demo and wrote a lot of mock implementations. This problem was resolved after I explicitly stated in my rules that the code should be robust.