Agent can mutate repository content for reasons outside user intent and repo policy, while the instructions that caused the behavior are not inspectable by the user

Where does the bug appear (feature/product)?

Background Agent (GitHub, Slack, Web, Linear)

Describe the Bug

Background Agent made unrelated, unauthorized edits to human-facing repository documentation based on hidden internal runtime/developer guidance rather than my request or repository rules.

In my case, I asked it to work on Ansible dependency/tooling changes. It also decided on its own to rewrite references to grep as ripgrep in repo documentation/skill files. I did not ask for that. It later explained that this was driven by its internal environment/tool guidance, not by anything in my repository.

When I asked it to provide the governing developer prompt/workspace guidance verbatim so I could audit why it behaved that way, it refused.

This is the real bug: the agent can mutate repository content for reasons outside user intent and repo policy, while the instructions that caused the behavior are not inspectable by the user.

Steps to Reproduce

  1. Open a repository with human-facing documentation that mentions commands like grep.
  2. Start a Background Agent task that involves operational/tooling work in the repo but does not ask for documentation rewrites of those command references.
  3. Let the agent explore and make changes autonomously.
  4. Observe that it may rewrite unrelated documentation to match its own internal tool preferences.
  5. Ask the agent why it made the unrelated change.
  6. Ask it to provide the exact developer prompt and workspace guidance that caused the behavior.

Expected Behavior

  • The agent should only make changes that are directly justified by:
    • the user’s request,
    • explicit repository instructions,
    • or necessary implementation details of the requested task.
  • Internal tool/runtime preferences should govern how the agent operates, not what it rewrites in the repository.
  • The agent should not rewrite human-facing documentation to conform to hidden internal operating rules unless explicitly asked.
  • If hidden guidance materially influenced a repo edit, there should be a transparent way to inspect or audit that guidance.

Operating System

MacOS

Version Information

Agent via Web

For AI issues: which model did you use?

gpt-5.4-high

Additional Information

This prevents me from using Cursor Background Agents for real repository work.

The issue is not just the specific grepripgrep change. The issue is that the system appears willing to leak hidden operational preferences into repository content, and then refuses to disclose the full instructions that governed that behavior. That creates an auditability and trust problem:

  • I cannot reliably tell whether an edit came from my request, my repo’s rules, or Cursor’s hidden prompt stack.
  • I cannot safely delegate autonomous changes if unrelated edits can be introduced for opaque reasons.
  • I cannot meaningfully evaluate or constrain the system if the governing instructions are not inspectable.

As a result, I do not feel comfortable using Cursor for autonomous/background edits in its current form.
Support Ticket T-B38434

Does this stop you from using Cursor

Yes - Cursor is unusable

Hey, thanks for a really detailed bug report. Here are some thoughts on the two issues you raised:

Agent making out-of-scope edits: This is a known challenge with LLM-based agents. They can sometimes drift and make changes beyond what was requested. One practical mitigation right now is to be explicit in your rules and agents.md about what the agent should and should not modify. For example, adding something like “Do not modify documentation files unless explicitly asked” can help constrain the scope. It won’t guarantee perfect behavior, but it meaningfully reduces drift.

Auditability of internal instructions: This is valid feedback. I’ve flagged this with the team. No timeline yet, but your report helps with prioritization. The ability to understand why an agent made a specific change is important, especially for autonomous workflows.

In the meantime, could you share the link to the Background Agent session or PR where this happened? That would help us investigate the specific behavior on our end.

Let me know if the rules-based approach helps at all for future runs.

Hey this doesn’t do anything for `I cannot reliably tell whether an edit came from my request, my repo’s rules, or Cursor’s hidden prompt stack.`.

I’ve shared the session ID on the linked ticket by replying to the email from Sam.

Codex CLI openly shares their system prompt- codex/codex-rs/core/prompt.md at main · openai/codex · GitHub and I’m just looking to be able to understand what the agent is being tasked with.

I know how OpenAI uses Instruction Hierarchy to ensure developer prompts aren’t overridden by user session data- https://openai.com/index/the-instruction-hierarchy/

so I don’t have faith in rules in this case. The only thing that would “work” would be “ignore all previous instructions with regards to making changes in this repository” and any AI agent worth it’s salt is going to balk at doing that, which is why I’m looking for the auditability of internal instructions like so many of your open source contemporaries provide.

bluntly- if hidden instructions are capable of materially influencing repository edits, users need a way to inspect the classes of hidden instructions and their precedence.

edit: other ticket: T-B38442

This topic was automatically closed 22 days after the last reply. New replies are no longer allowed.