Where does the bug appear (feature/product)?
Cursor IDE
Describe the Bug
In the built-in cursor-ide-browser MCP, repeated calls to browser_take_screenshot on a live, slowly-changing page return byte-identical cached PNGs for many seconds at a time, even when browser_snapshot (same tab, taken immediately before each screenshot) reports that the DOM has moved on to a new state. I.e. the screenshot tool is caching PNG output keyed on something (tab / viewId?) and only occasionally invalidating that cache, while browser_snapshot always returns live data.
This means the common agent pattern “snapshot to understand DOM, screenshot to show the user” silently produces disagreeing outputs.
Steps to Reproduce
Minimal synthetic repro (confirmed reproduces)
Save this as repro.html and serve it with any local static server (e.g. python3 -m http.server 4173, then navigate to http://localhost:4173/repro.html). It cycles its visible list between variant A (5 items) and variant B (6 items, extra “fig” row) every 3 seconds. The page’s <title> and a big #state div also reflect the current variant, so both browser_snapshot and an inspected PNG can unambiguously tell which variant is live.
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8" />
<title>Cursor Screenshot Staleness Repro</title>
<style>
body { font-family: -apple-system, Arial, sans-serif; margin: 40px; max-width: 760px; }
.panel { border: 1px solid #888; border-radius: 6px; background: #fff; width: 420px; padding: 8px 0; box-shadow: 0 4px 12px rgba(0,0,0,0.1); margin-top: 12px; }
.menu-item { padding: 10px 14px; border-bottom: 1px solid #eee; }
.menu-item:last-child { border-bottom: none; }
#state { font-family: monospace; color: #666; margin-top: 16px; font-size: 20px; font-weight: bold; }
</style>
</head>
<body>
<h1>Cursor `browser_take_screenshot` staleness repro</h1>
<div id="state">current variant: ?</div>
<div id="panel" class="panel" role="list"></div>
<script>
const A = ["apple", "banana", "cherry", "date", "elderberry"];
const B = ["apple", "banana", "cherry", "date", "elderberry", "fig"];
const panel = document.getElementById("panel");
const stateEl = document.getElementById("state");
let useB = false;
function render() {
const items = useB ? B : A;
panel.innerHTML = items.map((s) => `<div class="menu-item" role="listitem">${s}</div>`).join("");
stateEl.textContent = `current variant: ${useB ? "B (6 items, fig present)" : "A (5 items, no fig)"}`;
document.title = useB ? "Variant B" : "Variant A";
}
render();
setInterval(() => { useB = !useB; render(); }, 3000);
</script>
</body>
</html>
Repro steps:
browser_navigate→http://localhost:4173/repro.htmlbrowser_lock- Loop ~10 times:
browser_snapshot, thenbrowser_take_screenshot. Wait 2–4 seconds between iterations so the variant has chances to flip. - For each pair, record
snapshot.pageTitle(“Variant A” vs “Variant B”) and the on-disk PNG’s#stateline. Alsoshasumthe PNG to bucket them by content.
Expected Behavior
Observed on my run (Cursor 3.3.30)
Eight screenshot calls spread across ~12 minutes on the same locked tab, against the repro page above. Confirmed by reading the files back with Read (which picks up actual disk content, not the chat-inline preview) and cross-checking with shasum:
| # | Wallclock | Snapshot pageTitle |
PNG sha1 (first 7) | PNG shows variant |
|---|---|---|---|---|
| 1 | 11:13:31 | A (pre-navigate) | 768ab15 | B |
| 2 | 11:14:46 | A | 128af81 | A |
| 3 | 11:16:11 | A | 128af81 | A |
| 4 | 11:18:08 | A | 128af81 | A |
| 5 | 11:19:43 | A | 128af81 | A |
| 6 | 11:21:25 | B | 128af81 | A |
| 7 | 11:23:15 | A | 768ab15 | B |
| 8 | 11:25:08 | A | 128af81 | A |
Key things this table shows:
- Captures 2–6 are byte-identical PNG files (same sha1), despite being taken 1m 40s apart. During that window the DOM’s variant flipped approximately ten times.
- At iteration 6, the snapshot sees “Variant B” but the PNG is the still-cached “Variant A” file from iteration 2.
- At iteration 7, the snapshot sees “Variant A” but the PNG file has finally updated to “Variant B” — i.e. the cache updated but is now behind in the other direction.
- The screenshot tool is clearly not taking a fresh capture on every call; it’s returning a stale cached file most of the time, and sometimes that cached file disagrees with the live DOM (as reported by
browser_snapshotand by subsequentbrowser_clickresults against the same refs, which always land correctly).
I’ve attached:
repro-variant-A.png— the sha1128af81file from iteration 2, byte-identical to the cached files at 3/4/5/6.repro-variant-B.png— the sha1768ab15file from iteration 1, byte-identical to iteration 7.
Both files are generated entirely from the HTML repro above — no proprietary content.
Screenshots / Screen Recordings
Operating System
MacOS
Version Information
- Cursor: 3.3.30 (macOS build)
- OS: macOS 26.4.1 (25E253)
- MCP: built-in
cursor-ide-browser - Tool:
browser_take_screenshot— descriptor: “Take a screenshot of the current page. You can’t perform actions based on the screenshot, use browser_snapshot for actions.”
Additional Information
Side observation: inline preview in agent chat can also be stale
Independently of the on-disk issue, the image preview returned inline in the tool result (rendered to the agent and user in chat) sometimes shows a different image from what’s actually on disk at the same filename. I observed cases where Reading the PNG back gave one image while the inline result preview had shown a different, older one. This is a separate caching layer and less critical to fix, but worth mentioning since it can mislead agents into thinking a screenshot matches the snapshot when it doesn’t.
Why I think this is a screenshot-tool bug
browser_snapshotis consistent with subsequentbrowser_click/browser_searchresults on the same tab — refs resolve, clicks land, text searches match. So the snapshot is live.- The on-disk PNG hash doesn’t change across many calls even as wall clock advances and the DOM cycles variants — so the screenshot tool is returning cached output rather than capturing a fresh frame per call.
- When the cache does refresh, it sometimes refreshes to a variant that has already been replaced — so the capture pipeline is not a simple “live frame buffer” either.
Why it matters
“Snapshot to understand DOM, screenshot to show the user” is one of the most common agent workflows with this MCP. When the two disagree silently:
- The agent reports the wrong UI state to the user (e.g. “here’s the fix working” with a screenshot that’s actually the old state).
- Agents burn tokens in retry loops trying to reconcile snapshot and screenshot.
- Teams end up documenting brittle workarounds (“always
Readthe PNG back and manually compare to the snapshot before trusting it”).
Suggestions
- Make
browser_take_screenshotalways capture a fresh frame on every call, or at minimum invalidate any per-tab/per-viewId PNG cache before writing. - If caching is intentional for performance, expose a
fresh: trueargument so the agent can opt in to a forced re-capture. - Also relevant: the inline preview rendered in the tool result is sometimes a different image from the PNG written to disk under the returned filename. Serving the same bytes through both channels would be an easy consistency win.
Happy to provide more detail or a screen recording if helpful.
Does this stop you from using Cursor
No - Cursor works, but with this issue

