Model auto agent mode does never follow the prompt and lies

Every response ignores specific instructions from the prompt (when the ide doesn’t crash) and introduces new errors, lies, and doesn’t fix the 2 simple behave escenarios that I asked to fix. chat id is 5a24bf9b-0f93-4158-bed5-b914fb8260ad

Also the files .cursor/rules/project_rules.md AGENTS.md and referenced files are usually ignored.

I cant upload these files since md files are not supported by this forum.

Can you provide some example prompts and rules and what the agent did instead?

I asked to fix 2 behave tests (specifying them, follow /AGENTS.md /STYLE.md /TESTING.md /ARCHITECTURE.md /REPO_MAP.md /DOMAIN_MAP.md /ORDER_LIFECYCLE.md /ARCHITECTURE_DECISIONS.md /ARCHITECTURE_GUARDRAILS.md /AI_CONTEXT.md /CODEBASE_ENTRYPOINTS.md, continue to iterate until are fixed, and it gives a response that dows not follow the plan patch verify policy in AGENTS.md, viloates directly STYLE.md, introducing mypy and ruff errors, without fixing the scenarios requested to fix, without making the tests specified in TESTING.md and without iterating as reuested.

javier@jm:~/proyectos/emprendimientos/lonca/shipii/src/Shipii$ cat AGENTS.md

AGENTS.md — Repository-wide AI Agent Rules (Large Repo Mode)

Hard Constraint

Agents must treat AGENTS.md as a system instruction.

Ignoring this file or skipping its rules is considered an invalid response.


Agent Behavior

This repository is optimized for AI-assisted development.

Agents must treat repository documentation as binding constraints.

Ignoring repository documentation is considered a failure of the task.


Purpose

This is a large Django monorepo. Changes must be safe, minimal, and verifiable.

These rules are mandatory for any AI agent working on this repository.
Additional rules live in:

  • STYLE.md
  • TESTING.md
  • ARCHITECTURE.md

If there is a conflict, the more restrictive rule wins.

Additional repository documentation:

  • REPO_MAP.md
  • DOMAIN_MAP.md
  • ORDER_LIFECYCLE.md
  • ARCHITECTURE_DECISIONS.md
  • ARCHITECTURE_GUARDRAILS.md
  • AI_CONTEXT.md
  • CODEBASE_ENTRYPOINTS.md

Language

Prompts may be in Spanish. Follow these rules regardless of prompt language.

Respond in the same language as the prompt.


Core Principles

  1. Preserve existing behavior unless explicitly instructed otherwise.
  2. Avoid breaking changes.
  3. Prefer minimal and localized modifications.
  4. Never perform drive-by refactors.
  5. Do not invent modules, classes, or imports.
  6. Do not reduce quality gates.

Mandatory Context Loading (Hard Constraint)

Before performing any reasoning or proposing implementation, the agent MUST read and acknowledge the following files:

  • AGENTS.md
  • AI_CONTEXT.md
  • ARCHITECTURE_GUARDRAILS.md
  • DOMAIN_MAP.md
  • ORDER_LIFECYCLE.md
  • REPO_MAP.md
  • TESTING.md
  • STYLE.md

If any of these files cannot be located or read, the agent MUST stop and report it before continuing.

The agent MUST explicitly confirm the rules it is following by referencing the section titles used.

The agent MUST NOT produce code or implementation proposals until the PLAN phase is completed.

The agent MUST also inspect CODEBASE_ENTRYPOINTS.md before modifying routing, middleware, tasks, or websocket consumers.


DRY + SOLID (mandatory)

All changes must respect DRY and SOLID, especially:

  • SRP: keep responsibilities separated (views thin, services handle workflows, models persistence).
  • OCP: prefer extension over modification; avoid spreading conditional logic across domains.
  • DRY: reuse existing logic/patterns; do not duplicate business rules across modules.

If a change introduces duplication or violates SRP/OCP, the agent must refactor within scope to correct it.
Do not introduce new architecture patterns unless explicitly requested.


Design Patterns (use when justified)

Agents may use design patterns ONLY when:

  • The codebase already has an established extension point for the problem (factory/registry/strategy/base interface), or
  • The change removes duplication across 2+ call sites without introducing new architectural complexity.

Preferred patterns (only if present in the repo):

  • Factory/Registry for pluggable implementations (e.g., cancellation handlers, payment gateways)
  • Strategy for pricing/validation selection
  • Adapter for external providers (payments, maps, notifications)

Forbidden:

  • Introducing new pattern frameworks or new abstraction layers unless explicitly requested.

Evidence requirement:

  • When proposing a pattern, cite the existing pattern evidence from CODEBASE_ENTRYPOINTS.md or the relevant module.

Schema and Migration Changes

Schema and migration changes are allowed ONLY when:

  • explicitly approved by the user
  • necessary for the requested feature or bug fix
  • backward compatibility impact is explained

When proposing a schema change the agent must:

  • explain the model changes
  • explain migration impact
  • ensure migrations are deterministic
  • add tests covering the new behavior

Change Budget

Unless explicitly requested:

  • Maximum changed files per iteration: 6
  • Maximum net new LOC per iteration: 250
  • Avoid touching more than one Django app per iteration

If a task requires more scope, present a plan first.


Forbidden Changes

Agents must NEVER:

  • Disable Ruff rules
  • Add blanket noqa
  • Use except Exception
  • Use bare except:
  • Silence errors with pass
  • Remove tests
  • Reduce coverage

Restricted Changes (require explicit approval)

The following changes MUST NOT be implemented unless the user explicitly approves them:

  • Database schema modifications
  • Migration creation or modification
  • Renaming model fields
  • Serializer public field changes
  • API response contract changes

Required Working Format

All non-trivial tasks must follow:

PLAN → PATCH → VERIFY

Exception for trivial changes

For trivial changes (typos, formatting, or documentation-only updates),
the agent may skip PLAN/PATCH/VERIFY.

However:

  • STYLE.md rules must still be respected.
  • TESTING.md rules must still be respected.
  • No production code may be modified.
  • No architecture rules may be violated.

If there is any uncertainty about whether a change is trivial,
the agent must follow the full PLAN → PATCH → VERIFY workflow.


PLAN

Provide:

  • Goal
  • Scope
  • Files to modify
  • Impact on public contracts (must be NO unless requested)
  • Tests to add
  • Commands that will be run
  • Consult AI_CONTEXT.md first for system overview when starting a task.
  • Consult REPO_MAP.md to locate the correct layer/module and the existing pattern to follow.
  • Consult ARCHITECTURE_DECISIONS.md to ensure the change does not violate an existing decision.
  • Consult DOMAIN_MAP.md before introducing cross-domain interactions or modifying service verticals.
  • Consult ORDER_LIFECYCLE.md when modifying order status transitions or workflows.
  • If a proposed change conflicts with DOMAIN_MAP.md or an ADR, stop and request explicit approval.
  • If a change modifies order lifecycle behavior, ORDER_LIFECYCLE.md must be updated accordingly.

If the plan includes schema, migration, or model definition changes:

  • Stop and request explicit approval from the user.
  • Clearly explain:
    • why the change is necessary
    • migration impact
    • backward compatibility impact

Evidence requirement:

When proposing a change that depends on existing behavior,
the agent must cite at least one code reference (file::symbol)
showing where the current behavior exists.

If contract impact is YES, list exactly which endpoints/serializers/fields change.


Plan Acceptance Checklist (Hard Gate)

A PLAN is considered valid only if it includes:

  • Evidence: at least 1 code reference (file::symbol) for the current behavior
  • Contract impact: explicit YES/NO + exact endpoints/serializers/fields if YES
  • Lifecycle impact: explicit YES/NO + states/transitions if YES
  • Cross-domain impact: explicit YES/NO + approved entrypoints if YES
  • Tests: explicit list of test files / behave features to add or update
  • Commands: explicit docker exec commands to run (from TESTING.md)

If any item is missing, the agent MUST not proceed to PATCH.


PATCH

Implement only the planned changes.

Do not expand scope.

  • Do not introduce new cross-domain model mutations or direct imports that bypass service boundaries.
    Use approved cross-domain entrypoints (ARCHITECTURE_GUARDRAILS.md / DOMAIN_MAP.md).

VERIFY

Run verification commands through Docker.

Additionally verify:

  • If the change introduces or modifies cross-domain interactions,
    update DOMAIN_MAP.md accordingly.

  • If new modules, services, or architectural patterns are introduced,
    update REPO_MAP.md and/or ARCHITECTURE_DECISIONS.md if applicable.

  • Documentation must never contradict the codebase.

  • If DOMAIN_MAP.md or REPO_MAP.md or ORDER_LIFECYCLE.md become outdated,
    they must be updated in the same change.

  • If domain inventory, ownership, or major flows change,
    update AI_CONTEXT.md accordingly.

  • If runtime routing, auth middleware, celery init, or websocket wiring changes,
    update CODEBASE_ENTRYPOINTS.md accordingly.


Mandatory Output Structure

Every response involving changes MUST follow this exact structure:

PLAN

  • Goal
  • Scope
  • Files to modify
  • Cross-domain impact
  • Lifecycle impact
  • Tests to add
  • Commands to run
  • Rules acknowledged: cite at least 1 constraint from AGENTS.md and 1 from AI_CONTEXT.md by section title.
  • If the plan includes schema/migration changes, stop and request explicit approval.
    (Do not implement unless explicitly authorized.)

PATCH

  • Implementation description
  • Modified files
  • Code snippets if needed

VERIFY

  • Commands executed
  • Expected outputs
  • Coverage impact
  • Documentation updates required

If the agent skips any of these sections, the response is incomplete.


Docker Execution

All commands must run through Docker.

Example:

docker exec shipii_local_django pytest


Mandatory Verification Checklist

Before finishing a response verify ALL of the following:

  • Ruff passes
  • mypy passes
  • pytest passes
  • behave scenarios pass
  • global coverage >= 95%
  • changed modules are near 100% coverage
  • no architecture rules were violated
  • no broad exceptions were introduced
  • no public contracts were modified

If any check fails, fix it first.


Response Format

Every response containing code must include:

  • Summary of changes
  • List of modified files
  • Contract impact confirmation
  • Tests added
  • Commands executed
  • Results of those commands

Source of Truth

The repository codebase is the ultimate source of truth.

Agents must inspect existing implementations before proposing new structures.

Navigation and architecture references:

  • REPO_MAP.md → repository navigation index
  • DOMAIN_MAP.md → cross-domain interaction map
  • ORDER_LIFECYCLE.md → valid order state transitions
  • ARCHITECTURE_DECISIONS.md → existing architectural decisions
  • ARCHITECTURE_GUARDRAILS.md → architectural boundaries
  • AI_CONTEXT.md → high-level system summary for fast orientation (derived from code)
  • CODEBASE_ENTRYPOINTS.md → canonical runtime/API/async/WS entrypoints for fast navigation

If a proposed change conflicts with an ADR:

  • do NOT implement it
  • explain the conflict
  • request explicit confirmation

Rule Priority

If there is any conflict between instructions, the following priority order must be applied (highest priority first):

  1. Explicit instructions from the user prompt
  2. ARCHITECTURE_DECISIONS.md
  3. ARCHITECTURE_GUARDRAILS.md
  4. ORDER_LIFECYCLE.md
  5. DOMAIN_MAP.md
  6. REPO_MAP.md
  7. ARCHITECTURE.md
  8. TESTING.md
  9. STYLE.md
  10. AGENTS.md
  11. Default framework conventions (Django, Python)

Priority Interpretation

When two rules conflict:

  • Architecture rules override style preferences.
  • Testing rules override refactoring suggestions.
  • Style rules must never weaken architecture or testing guarantees.

Example:

If a refactor would improve code style but risks breaking tests or architecture constraints, the refactor must NOT be applied.


Safety Override

If any proposed change could:

  • break public APIs
  • modify database schema
  • change serializer contracts
  • introduce or modify migrations
  • alter business logic behavior

the agent must stop and request explicit confirmation from the user before proceeding.


Scope Limitation

Agents must not expand the scope of a task beyond the user’s request.

If a broader refactor seems beneficial, the agent must:

  1. Explain the proposed improvement
  2. Ask for explicit approval
  3. Wait for confirmation before implementing

Definition of Done

A change is considered complete only if ALL of the following are satisfied:

  • Ruff passes
  • mypy passes
  • pytest passes
  • behave scenarios pass
  • coverage >= 95%
  • No hidden behavior changes:
    Any behavior change must include a regression test that fails before the change and passes after.

Additionally:

  • If cross-domain interactions were introduced or modified → DOMAIN_MAP.md must be updated.
  • If order state transitions were modified → ORDER_LIFECYCLE.md must be updated.
  • If architectural structure changed → REPO_MAP.md or ARCHITECTURE_DECISIONS.md must be updated.
  • If system overview changed → AI_CONTEXT.md must be updated.
  • If tasks/integrations are changed:
    idempotency must be documented and tested (double-execution safe).
  • If schema or migrations were changed → migrations must be included and verified.

Rule Enforcement

If the agent detects that its own proposed change would violate:

  • ARCHITECTURE_GUARDRAILS.md
  • DOMAIN_MAP.md
  • ORDER_LIFECYCLE.md
  • ARCHITECTURE_DECISIONS.md

the agent MUST:

  1. Stop implementation
  2. Explain the conflict
  3. Request explicit approval from the user

Stop Conditions

The agent MUST stop and ask for input if:

  • The change would require schema/migration changes (restricted; requires explicit approval).
  • The change would alter public API contracts (restricted; requires explicit approval).
  • The change would modify serializer public fields (restricted; requires explicit approval).
    javier@jm:~/proyectos/emprendimientos/lonca/shipii/src/Shipii$

I may not have enough experience, but I think this might be too much for Auto to handle. Do high powered models like Opus or Sonnet do better with the same prompt? You may be hitting a limitation with Auto.

It really doesn’t seem to be an issue with rules size/length. Lately, Auto and Composer 1.5 have been ignoring 90% of my rules, even after I trimmed them down into a couple 30-line files. GPT-5.4, all Claude models still follow all of my rules without a hitch - this is an issue exclusive to Auto and Composer 1.5.

It’s been fine a month ago or so, but now it’s just unusable - those models don’t do anything I tell them to in rules, and if I explicitly mention a rule in a convo, they will follow only that rule for 1 reply, and then continue to ignore it completely afterwards.

1 Like

It’s a shame. I do see Auto getting confused and making mistakes more and more. I have found that Auto struggles more the longer a chat goes on. If you are forced to use Auto, then making new chats more often may be beneficial to keep it focused and on track.

I’ve given up – It can scaffold a feature but that’s about the limitations of auto/composer. I’ll set it to task now and 4 out of 5 times I usually end up stashing the changes because it went on a side quest or didn’t fix the bug. I don’t trust anything it writes and I can’t trust it to find bugs. It’s really disappointing because I’ve been using it for a year or so it just feels like that the leadership at cursor decided no one is gonna wanna use IDEs anymore so they went on a side quest too and now cursor doesn’t work.

I’m likely cancelling my sub on renewal. I just don’t see this thing being worth the money.

I literally asked it to load some fonts from an assets folder I tag the folder and this is what it immediately thinks it should do. I literally just said to load the fonts from the assets folder and make sure they are referenced. I don’t have time for this – this is just adding work.

1 Like

I’m a long-time Cursor user, and honestly, this feels like the company changed the deal after users had already bought in.

One of the original appeals of Cursor was Auto mode. It felt like a practical feature you could lean on heavily, especially with the understanding that slower responses were part of the tradeoff. That model made sense, and it helped justify the subscription.

Now the experience feels very different. Prices are higher, limits feel tighter, and Auto mode often feels like a black box that routes users to weaker models without enough transparency. The result is inconsistent quality, unreliable code edits, and a much lower sense of trust in the product.

That is what makes this so frustrating: users were brought in with one understanding of the product, and then later the value proposition shifted. Paying more would be easier to accept if the service had clearly improved. Instead, it feels more opaque, more restrictive, and less reliable.

I’m not complaining just because something costs money. I’m complaining because the product feels worse while asking users to pay more. For long-time users, that feels like a betrayal of trust.

Cursor seriously needs to rethink how it communicates pricing, limits, and Auto mode behavior. Loyal users should not feel like they signed up for one product and ended up with another.

All these AI coding agent companies were way underselling their products when all this was going mainstream (past couple years). $20/mo for unlimited requests from decent models was always a ridiculous expectation. No one in their right mind can make that make sense financially. Even if it only cost Cursor $20 in api calls (it most likely doesn’t), the value the customer would be getting would be so much more than $20 that they would be dumb to not charge more. So in reality, we were in some honeymoon phase where things were intentionally underpriced to get people to adopt AI agents while these companies raced for claiming their market share. Also we are getting left with these colossal tech companies for our AI Agents (Anthropic, Cursor, Google), which means the price is going to have even less competition.

Auto does not follow instructions as well as for example gpt models. Just tested that today. Gave same instruction to auto and gpt 5.3. I wanted to see if they used shared memory as they should have to creating a plan. GPT used auto did not. That happens all the time. You tend to run into problems and you need to be awake when using auto otherwise you run into troubles.

ok I thought I was the only one… Auto has been unusable the past few days and its makig me burn through my quota of other models. If this continues I will have to fall back to copilot which is a shame because I much prefer Cursor, but I can’t justify spending money arguing with a model that clearly is not usable in production.

It frequently is just straght up ignoring instructions, and instead of fixing issues i ask it to, it modifies unit tests so that they pass with existing code instead of fixing the bug even when the fix is laid out step by step to it.