Prefer workspace files over training data when populating structured content

Feature request for product/service

Cursor IDE

Describe the request

Problem

When an agent is asked to expand or populate structured content (tables, scored matrices, configuration mappings) that derives from another file in the workspace, it frequently generates plausible-sounding content from training knowledge instead of reading the actual source file.

Example: I have a requirements canvas (requirements.canvas.tsx) with 178 numbered requirements (FR-1.1 through FR-13.5, NFR-1.1 through NFR-12.6). A separate evaluation canvas scores each requirement. When I asked the agent to “show all requirements by number, not only some of them,” it expanded the tables from a subset to all rows — but generated 41 of the 178 requirement labels from generic industry knowledge instead of reading them from the requirements file that was right there in the workspace.

The fabricated labels were fluent and structurally correct (“P99 latency under load”, “Activity completion criteria”, “API authentication (OAuth2/OIDC)”), so the error was invisible without a line-by-line cross-reference against the source file. It took several review cycles before we caught it.

Why this matters

  • Silent data corruption. The fabricated content looks correct and passes casual review. Scores were computed against requirements that don’t exist.
  • Compounds over iterations. In long conversations with iterative edits, each expansion introduces more drift from the source. By the time someone audits, dozens of entries may be wrong.
  • No user expects this. When a source file exists in the workspace and the agent is asked to populate content from it, no user expects the agent to make things up instead of reading the file.

Suggested behaviour

When the agent is asked to populate, expand, or update structured content and a plausible source file exists in the workspace:

  1. Read the source file before generating. The agent should identify the source file and read the relevant sections before writing any rows, labels, or identifiers.
  2. Copy identifiers verbatim. IDs, requirement names, headings, and labels should be copied from the source — not paraphrased or substituted with equivalent concepts from training data.
  3. Flag gaps explicitly. If the source doesn’t contain enough information to fill a field, the agent should say so rather than filling it with a plausible guess.

Current workaround

I’ve created a .cursor/rules/no-fabrication.mdc rule with alwaysApply: true that instructs the agent to read source files before populating structured content. It helps, but it’s a workaround — this should be default agent behaviour, not something each user needs to discover and configure after encountering the failure mode.

Environment

  • Cursor with background agents
  • Canvas files (.canvas.tsx)
  • Long multi-turn conversations with iterative document refinement

We really nead this cause this happens a lot.. At least fixing the problem with another better solution (if existed)..

Hey @Johan_Snyman

Curious what models you use/were using here? I expect the behavior will vary quite a bit model to model!

I was using claude-4.6-opus-high-thinking in MAX mode.