Model auto agent mode does never follow the prompt and lies

javiermarcon · March 7, 2026, 7:58am

Every response ignores specific instructions from the prompt (when the ide doesn’t crash) and introduces new errors, lies, and doesn’t fix the 2 simple behave escenarios that I asked to fix. chat id is 5a24bf9b-0f93-4158-bed5-b914fb8260ad

Also the files .cursor/rules/project_rules.md AGENTS.md and referenced files are usually ignored.

I cant upload these files since md files are not supported by this forum.

MidnightOak · March 7, 2026, 8:20am

Can you provide some example prompts and rules and what the agent did instead?

javiermarcon · March 7, 2026, 8:32am

I asked to fix 2 behave tests (specifying them, follow /AGENTS.md /STYLE.md /TESTING.md /ARCHITECTURE.md /REPO_MAP.md /DOMAIN_MAP.md /ORDER_LIFECYCLE.md /ARCHITECTURE_DECISIONS.md /ARCHITECTURE_GUARDRAILS.md /AI_CONTEXT.md /CODEBASE_ENTRYPOINTS.md, continue to iterate until are fixed, and it gives a response that dows not follow the plan patch verify policy in AGENTS.md, viloates directly STYLE.md, introducing mypy and ruff errors, without fixing the scenarios requested to fix, without making the tests specified in TESTING.md and without iterating as reuested.

javier@jm:~/proyectos/emprendimientos/lonca/shipii/src/Shipii$ cat AGENTS.md

AGENTS.md — Repository-wide AI Agent Rules (Large Repo Mode)

Hard Constraint

Agents must treat AGENTS.md as a system instruction.

Ignoring this file or skipping its rules is considered an invalid response.

Agent Behavior

This repository is optimized for AI-assisted development.

Agents must treat repository documentation as binding constraints.

Ignoring repository documentation is considered a failure of the task.

Purpose

This is a large Django monorepo. Changes must be safe, minimal, and verifiable.

These rules are mandatory for any AI agent working on this repository.
Additional rules live in:

STYLE.md
TESTING.md
ARCHITECTURE.md

If there is a conflict, the more restrictive rule wins.

Additional repository documentation:

REPO_MAP.md
DOMAIN_MAP.md
ORDER_LIFECYCLE.md
ARCHITECTURE_DECISIONS.md
ARCHITECTURE_GUARDRAILS.md
AI_CONTEXT.md
CODEBASE_ENTRYPOINTS.md

Language

Prompts may be in Spanish. Follow these rules regardless of prompt language.

Respond in the same language as the prompt.

Core Principles

Preserve existing behavior unless explicitly instructed otherwise.
Avoid breaking changes.
Prefer minimal and localized modifications.
Never perform drive-by refactors.
Do not invent modules, classes, or imports.
Do not reduce quality gates.

Mandatory Context Loading (Hard Constraint)

Before performing any reasoning or proposing implementation, the agent MUST read and acknowledge the following files:

AGENTS.md
AI_CONTEXT.md
ARCHITECTURE_GUARDRAILS.md
DOMAIN_MAP.md
ORDER_LIFECYCLE.md
REPO_MAP.md
TESTING.md
STYLE.md

If any of these files cannot be located or read, the agent MUST stop and report it before continuing.

The agent MUST explicitly confirm the rules it is following by referencing the section titles used.

The agent MUST NOT produce code or implementation proposals until the PLAN phase is completed.

The agent MUST also inspect CODEBASE_ENTRYPOINTS.md before modifying routing, middleware, tasks, or websocket consumers.

DRY + SOLID (mandatory)

All changes must respect DRY and SOLID, especially:

SRP: keep responsibilities separated (views thin, services handle workflows, models persistence).
OCP: prefer extension over modification; avoid spreading conditional logic across domains.
DRY: reuse existing logic/patterns; do not duplicate business rules across modules.

If a change introduces duplication or violates SRP/OCP, the agent must refactor within scope to correct it.
Do not introduce new architecture patterns unless explicitly requested.

Design Patterns (use when justified)

Agents may use design patterns ONLY when:

The codebase already has an established extension point for the problem (factory/registry/strategy/base interface), or
The change removes duplication across 2+ call sites without introducing new architectural complexity.

Preferred patterns (only if present in the repo):

Factory/Registry for pluggable implementations (e.g., cancellation handlers, payment gateways)
Strategy for pricing/validation selection
Adapter for external providers (payments, maps, notifications)

Forbidden:

Introducing new pattern frameworks or new abstraction layers unless explicitly requested.

Evidence requirement:

When proposing a pattern, cite the existing pattern evidence from CODEBASE_ENTRYPOINTS.md or the relevant module.

Schema and Migration Changes

Schema and migration changes are allowed ONLY when:

explicitly approved by the user
necessary for the requested feature or bug fix
backward compatibility impact is explained

When proposing a schema change the agent must:

explain the model changes
explain migration impact
ensure migrations are deterministic
add tests covering the new behavior

Change Budget

Unless explicitly requested:

Maximum changed files per iteration: 6
Maximum net new LOC per iteration: 250
Avoid touching more than one Django app per iteration

If a task requires more scope, present a plan first.

Forbidden Changes

Agents must NEVER:

Disable Ruff rules
Add blanket noqa
Use except Exception
Use bare except:
Silence errors with pass
Remove tests
Reduce coverage

Restricted Changes (require explicit approval)

The following changes MUST NOT be implemented unless the user explicitly approves them:

Database schema modifications
Migration creation or modification
Renaming model fields
Serializer public field changes
API response contract changes

Required Working Format

All non-trivial tasks must follow:

PLAN → PATCH → VERIFY

Exception for trivial changes

For trivial changes (typos, formatting, or documentation-only updates),
the agent may skip PLAN/PATCH/VERIFY.

However:

STYLE.md rules must still be respected.
TESTING.md rules must still be respected.
No production code may be modified.
No architecture rules may be violated.

If there is any uncertainty about whether a change is trivial,
the agent must follow the full PLAN → PATCH → VERIFY workflow.

PLAN

Provide:

Goal
Scope
Files to modify
Impact on public contracts (must be NO unless requested)
Tests to add
Commands that will be run
Consult AI_CONTEXT.md first for system overview when starting a task.
Consult REPO_MAP.md to locate the correct layer/module and the existing pattern to follow.
Consult ARCHITECTURE_DECISIONS.md to ensure the change does not violate an existing decision.
Consult DOMAIN_MAP.md before introducing cross-domain interactions or modifying service verticals.
Consult ORDER_LIFECYCLE.md when modifying order status transitions or workflows.
If a proposed change conflicts with DOMAIN_MAP.md or an ADR, stop and request explicit approval.
If a change modifies order lifecycle behavior, ORDER_LIFECYCLE.md must be updated accordingly.

If the plan includes schema, migration, or model definition changes:

Stop and request explicit approval from the user.
Clearly explain:
- why the change is necessary
- migration impact
- backward compatibility impact

Evidence requirement:

When proposing a change that depends on existing behavior,
the agent must cite at least one code reference (file::symbol)
showing where the current behavior exists.

If contract impact is YES, list exactly which endpoints/serializers/fields change.

Plan Acceptance Checklist (Hard Gate)

A PLAN is considered valid only if it includes:

Evidence: at least 1 code reference (file::symbol) for the current behavior
Contract impact: explicit YES/NO + exact endpoints/serializers/fields if YES
Lifecycle impact: explicit YES/NO + states/transitions if YES
Cross-domain impact: explicit YES/NO + approved entrypoints if YES
Tests: explicit list of test files / behave features to add or update
Commands: explicit docker exec commands to run (from TESTING.md)

If any item is missing, the agent MUST not proceed to PATCH.

PATCH

Implement only the planned changes.

Do not expand scope.

Do not introduce new cross-domain model mutations or direct imports that bypass service boundaries.
Use approved cross-domain entrypoints (ARCHITECTURE_GUARDRAILS.md / DOMAIN_MAP.md).

VERIFY

Run verification commands through Docker.

Additionally verify:

If the change introduces or modifies cross-domain interactions,
update DOMAIN_MAP.md accordingly.
If new modules, services, or architectural patterns are introduced,
update REPO_MAP.md and/or ARCHITECTURE_DECISIONS.md if applicable.
Documentation must never contradict the codebase.
If DOMAIN_MAP.md or REPO_MAP.md or ORDER_LIFECYCLE.md become outdated,
they must be updated in the same change.
If domain inventory, ownership, or major flows change,
update AI_CONTEXT.md accordingly.
If runtime routing, auth middleware, celery init, or websocket wiring changes,
update CODEBASE_ENTRYPOINTS.md accordingly.

Mandatory Output Structure

Every response involving changes MUST follow this exact structure:

PLAN

Goal
Scope
Files to modify
Cross-domain impact
Lifecycle impact
Tests to add
Commands to run
Rules acknowledged: cite at least 1 constraint from AGENTS.md and 1 from AI_CONTEXT.md by section title.
If the plan includes schema/migration changes, stop and request explicit approval.
(Do not implement unless explicitly authorized.)

PATCH

Implementation description
Modified files
Code snippets if needed

VERIFY

Commands executed
Expected outputs
Coverage impact
Documentation updates required

If the agent skips any of these sections, the response is incomplete.

Docker Execution

All commands must run through Docker.

Example:

docker exec shipii_local_django pytest

Mandatory Verification Checklist

Before finishing a response verify ALL of the following:

Ruff passes
mypy passes
pytest passes
behave scenarios pass
global coverage >= 95%
changed modules are near 100% coverage
no architecture rules were violated
no broad exceptions were introduced
no public contracts were modified

If any check fails, fix it first.

Response Format

Every response containing code must include:

Summary of changes
List of modified files
Contract impact confirmation
Tests added
Commands executed
Results of those commands

Source of Truth

The repository codebase is the ultimate source of truth.

Agents must inspect existing implementations before proposing new structures.

Navigation and architecture references:

REPO_MAP.md → repository navigation index
DOMAIN_MAP.md → cross-domain interaction map
ORDER_LIFECYCLE.md → valid order state transitions
ARCHITECTURE_DECISIONS.md → existing architectural decisions
ARCHITECTURE_GUARDRAILS.md → architectural boundaries
AI_CONTEXT.md → high-level system summary for fast orientation (derived from code)
CODEBASE_ENTRYPOINTS.md → canonical runtime/API/async/WS entrypoints for fast navigation

If a proposed change conflicts with an ADR:

do NOT implement it
explain the conflict
request explicit confirmation

Rule Priority

If there is any conflict between instructions, the following priority order must be applied (highest priority first):

Explicit instructions from the user prompt
ARCHITECTURE_DECISIONS.md
ARCHITECTURE_GUARDRAILS.md
ORDER_LIFECYCLE.md
DOMAIN_MAP.md
REPO_MAP.md
ARCHITECTURE.md
TESTING.md
STYLE.md
AGENTS.md
Default framework conventions (Django, Python)

Priority Interpretation

When two rules conflict:

Architecture rules override style preferences.
Testing rules override refactoring suggestions.
Style rules must never weaken architecture or testing guarantees.

Example:

If a refactor would improve code style but risks breaking tests or architecture constraints, the refactor must NOT be applied.

Safety Override

If any proposed change could:

break public APIs
modify database schema
change serializer contracts
introduce or modify migrations
alter business logic behavior

the agent must stop and request explicit confirmation from the user before proceeding.

Scope Limitation

Agents must not expand the scope of a task beyond the user’s request.

If a broader refactor seems beneficial, the agent must:

Explain the proposed improvement
Ask for explicit approval
Wait for confirmation before implementing

Definition of Done

A change is considered complete only if ALL of the following are satisfied:

Ruff passes
mypy passes
pytest passes
behave scenarios pass
coverage >= 95%
No hidden behavior changes:
Any behavior change must include a regression test that fails before the change and passes after.

Additionally:

If cross-domain interactions were introduced or modified → DOMAIN_MAP.md must be updated.
If order state transitions were modified → ORDER_LIFECYCLE.md must be updated.
If architectural structure changed → REPO_MAP.md or ARCHITECTURE_DECISIONS.md must be updated.
If system overview changed → AI_CONTEXT.md must be updated.
If tasks/integrations are changed:
idempotency must be documented and tested (double-execution safe).
If schema or migrations were changed → migrations must be included and verified.

Rule Enforcement

If the agent detects that its own proposed change would violate:

ARCHITECTURE_GUARDRAILS.md
DOMAIN_MAP.md
ORDER_LIFECYCLE.md
ARCHITECTURE_DECISIONS.md

the agent MUST:

Stop implementation
Explain the conflict
Request explicit approval from the user

Stop Conditions

The agent MUST stop and ask for input if:

The change would require schema/migration changes (restricted; requires explicit approval).
The change would alter public API contracts (restricted; requires explicit approval).
The change would modify serializer public fields (restricted; requires explicit approval).
javier@jm:~/proyectos/emprendimientos/lonca/shipii/src/Shipii$

MidnightOak · March 7, 2026, 1:23pm

I may not have enough experience, but I think this might be too much for Auto to handle. Do high powered models like Opus or Sonnet do better with the same prompt? You may be hitting a limitation with Auto.

kurushimee · March 16, 2026, 1:56pm

It really doesn’t seem to be an issue with rules size/length. Lately, Auto and Composer 1.5 have been ignoring 90% of my rules, even after I trimmed them down into a couple 30-line files. GPT-5.4, all Claude models still follow all of my rules without a hitch - this is an issue exclusive to Auto and Composer 1.5.

It’s been fine a month ago or so, but now it’s just unusable - those models don’t do anything I tell them to in rules, and if I explicitly mention a rule in a convo, they will follow only that rule for 1 reply, and then continue to ignore it completely afterwards.

MidnightOak · March 16, 2026, 3:50pm

It’s a shame. I do see Auto getting confused and making mistakes more and more. I have found that Auto struggles more the longer a chat goes on. If you are forced to use Auto, then making new chats more often may be beneficial to keep it focused and on track.

mastersplinter · March 17, 2026, 3:13am

I’ve given up – It can scaffold a feature but that’s about the limitations of auto/composer. I’ll set it to task now and 4 out of 5 times I usually end up stashing the changes because it went on a side quest or didn’t fix the bug. I don’t trust anything it writes and I can’t trust it to find bugs. It’s really disappointing because I’ve been using it for a year or so it just feels like that the leadership at cursor decided no one is gonna wanna use IDEs anymore so they went on a side quest too and now cursor doesn’t work.

I’m likely cancelling my sub on renewal. I just don’t see this thing being worth the money.

I literally asked it to load some fonts from an assets folder I tag the folder and this is what it immediately thinks it should do. I literally just said to load the fonts from the assets folder and make sure they are referenced. I don’t have time for this – this is just adding work.

Olleigeigei · March 17, 2026, 9:11am

I’m a long-time Cursor user, and honestly, this feels like the company changed the deal after users had already bought in.

One of the original appeals of Cursor was Auto mode. It felt like a practical feature you could lean on heavily, especially with the understanding that slower responses were part of the tradeoff. That model made sense, and it helped justify the subscription.

Now the experience feels very different. Prices are higher, limits feel tighter, and Auto mode often feels like a black box that routes users to weaker models without enough transparency. The result is inconsistent quality, unreliable code edits, and a much lower sense of trust in the product.

That is what makes this so frustrating: users were brought in with one understanding of the product, and then later the value proposition shifted. Paying more would be easier to accept if the service had clearly improved. Instead, it feels more opaque, more restrictive, and less reliable.

I’m not complaining just because something costs money. I’m complaining because the product feels worse while asking users to pay more. For long-time users, that feels like a betrayal of trust.

Cursor seriously needs to rethink how it communicates pricing, limits, and Auto mode behavior. Loyal users should not feel like they signed up for one product and ended up with another.

MidnightOak · March 17, 2026, 5:21pm

All these AI coding agent companies were way underselling their products when all this was going mainstream (past couple years). $20/mo for unlimited requests from decent models was always a ridiculous expectation. No one in their right mind can make that make sense financially. Even if it only cost Cursor $20 in api calls (it most likely doesn’t), the value the customer would be getting would be so much more than $20 that they would be dumb to not charge more. So in reality, we were in some honeymoon phase where things were intentionally underpriced to get people to adopt AI agents while these companies raced for claiming their market share. Also we are getting left with these colossal tech companies for our AI Agents (Anthropic, Cursor, Google), which means the price is going to have even less competition.

Janne_Niemensivu · March 18, 2026, 7:52am

Auto does not follow instructions as well as for example gpt models. Just tested that today. Gave same instruction to auto and gpt 5.3. I wanted to see if they used shared memory as they should have to creating a plan. GPT used auto did not. That happens all the time. You tend to run into problems and you need to be awake when using auto otherwise you run into troubles.

joeybab3 · March 18, 2026, 11:39pm

ok I thought I was the only one… Auto has been unusable the past few days and its makig me burn through my quota of other models. If this continues I will have to fall back to copilot which is a shame because I much prefer Cursor, but I can’t justify spending money arguing with a model that clearly is not usable in production.

It frequently is just straght up ignoring instructions, and instead of fixing issues i ask it to, it modifies unit tests so that they pass with existing code instead of fixing the bug even when the fix is laid out step by step to it.

Topic		Replies	Views
Auto mode suddenly horrificly bad Feedback auto-mode	19	456	March 19, 2026
Cursor Pro – Auto Mode “Unlimited” usage no longer feels unlimited Discussions	35	7270	May 1, 2026
Cursor "Auto" is no-longer unlimited Discussions	26	6098	March 11, 2026
Charging for Agent Hallucinations Feedback	10	235	February 27, 2026
Opus 4.6 was fun for 1 week..... why?! Discussions anthropic	12	1573	April 24, 2026

Model auto agent mode does never follow the prompt and lies

AGENTS.md — Repository-wide AI Agent Rules (Large Repo Mode)

Hard Constraint

Agent Behavior

Purpose

Language

Core Principles

Mandatory Context Loading (Hard Constraint)

DRY + SOLID (mandatory)

Design Patterns (use when justified)

Schema and Migration Changes

Change Budget

Forbidden Changes

Restricted Changes (require explicit approval)

Required Working Format

Exception for trivial changes

PLAN

Plan Acceptance Checklist (Hard Gate)

PATCH

VERIFY

Mandatory Output Structure

Docker Execution

Mandatory Verification Checklist

Response Format

Source of Truth

Rule Priority

Priority Interpretation

Safety Override

Scope Limitation

Definition of Done

Rule Enforcement

Stop Conditions

Related topics