Hi Cursor Team,
We are very interested in the “AI Code Tracking API” feature available in the Enterprise version, as it addresses the critical problem of code attribution.
To better understand the data flow and implementation architecture of this function, we’d like to clarify a core question: How is your attribution data generated?
We’ve primarily envisioned two possible technical paths and would appreciate your confirmation or correction:
Possibility A: Backend Diff / Analysis Mechanism
-
Does this mechanism involve Cursor’s backend servers storing all (or key) historical response data from AI chats, edits, and fixits?
-
Then, after a developer runs
git push(or on a periodic scan), does your backend service pull the latest commit content and perform a large-scale diff and matching (e.g., based on hashes, AST structure, or fuzzy/semantic matching) against this “AI response history database”? -
Is it through this “post-commit” process that you identify which code in the repository originated from AI?
Possibility B: Editor Real-time Metadata Mechanism
-
Alternatively, is this a client-side (editor) driven mechanism?
-
Does the Cursor IDE itself, at the moment AI-generated code is inserted into a file, attach a tracking tag (e.g.,
{source: "ai", model: "gpt-4", block_id: "uuid-..."}) to those specific lines or blocks, either in-memory or via a local lightweight DB? -
Subsequently, when the developer executes
git commit(orpush), does the Cursor client intercept this action, scan the staged files for these tracking tags, and then upload this “Attribution Report”—associating it with thecommit_hash—to your central API server?
Follow-up Questions (Based on the Mechanism):
We are asking whether it’s A or B because it directly impacts how the core challenge of “Edit Dilution” is handled:
-
If using (A) Backend Diff Mechanism:
-
Does this imply that the mechanism relies primarily on pure, unmodified AI code blocks?
-
If an AI code block is slightly modified by a developer (e.g., fixing a typo, renaming a variable), would the matching algorithm (whether hash-based or fuzzy) be likely to fail, leading to a statistical omission?
-
-
If using (B) Editor Metadata Mechanism:
-
When a developer modifies a line of code already tagged as
source: "ai"in the IDE (even just one character), how does the attribution of that tag change? -
Does it immediately flip to
source: "human"(i.e., a “zero-tolerance” attribution model), or is the attribution judged based on a modification threshold (e.g., adiffpercentage against the original AI-generated block)?
-
Our Request:
We are very curious which architectural path (A, B, or perhaps C, one we haven’t thought of) Cursor has chosen, and how you are solving the “edit dilution” problem that arises from it.
Understanding this core mechanism is crucial for us to evaluate the data accuracy and robustness of this API in a real-world development workflow.
We look forward to your professional insights.