`browser_take_screenshot` (cursor-ide-browser MCP) serves cached PNGs that disagree with `browser_snapshot` on the same tab

Where does the bug appear (feature/product)?

Cursor IDE

Describe the Bug

In the built-in cursor-ide-browser MCP, repeated calls to browser_take_screenshot on a live, slowly-changing page return byte-identical cached PNGs for many seconds at a time, even when browser_snapshot (same tab, taken immediately before each screenshot) reports that the DOM has moved on to a new state. I.e. the screenshot tool is caching PNG output keyed on something (tab / viewId?) and only occasionally invalidating that cache, while browser_snapshot always returns live data.

This means the common agent pattern “snapshot to understand DOM, screenshot to show the user” silently produces disagreeing outputs.

Steps to Reproduce

Minimal synthetic repro (confirmed reproduces)

Save this as repro.html and serve it with any local static server (e.g. python3 -m http.server 4173, then navigate to http://localhost:4173/repro.html). It cycles its visible list between variant A (5 items) and variant B (6 items, extra “fig” row) every 3 seconds. The page’s <title> and a big #state div also reflect the current variant, so both browser_snapshot and an inspected PNG can unambiguously tell which variant is live.

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8" />
    <title>Cursor Screenshot Staleness Repro</title>
    <style>
      body { font-family: -apple-system, Arial, sans-serif; margin: 40px; max-width: 760px; }
      .panel { border: 1px solid #888; border-radius: 6px; background: #fff; width: 420px; padding: 8px 0; box-shadow: 0 4px 12px rgba(0,0,0,0.1); margin-top: 12px; }
      .menu-item { padding: 10px 14px; border-bottom: 1px solid #eee; }
      .menu-item:last-child { border-bottom: none; }
      #state { font-family: monospace; color: #666; margin-top: 16px; font-size: 20px; font-weight: bold; }
    </style>
  </head>
  <body>
    <h1>Cursor `browser_take_screenshot` staleness repro</h1>
    <div id="state">current variant: ?</div>
    <div id="panel" class="panel" role="list"></div>
    <script>
      const A = ["apple", "banana", "cherry", "date", "elderberry"];
      const B = ["apple", "banana", "cherry", "date", "elderberry", "fig"];
      const panel = document.getElementById("panel");
      const stateEl = document.getElementById("state");
      let useB = false;
      function render() {
        const items = useB ? B : A;
        panel.innerHTML = items.map((s) => `<div class="menu-item" role="listitem">${s}</div>`).join("");
        stateEl.textContent = `current variant: ${useB ? "B (6 items, fig present)" : "A (5 items, no fig)"}`;
        document.title = useB ? "Variant B" : "Variant A";
      }
      render();
      setInterval(() => { useB = !useB; render(); }, 3000);
    </script>
  </body>
</html>

Repro steps:

  1. browser_navigatehttp://localhost:4173/repro.html
  2. browser_lock
  3. Loop ~10 times: browser_snapshot, then browser_take_screenshot. Wait 2–4 seconds between iterations so the variant has chances to flip.
  4. For each pair, record snapshot.pageTitle (“Variant A” vs “Variant B”) and the on-disk PNG’s #state line. Also shasum the PNG to bucket them by content.

Expected Behavior

Observed on my run (Cursor 3.3.30)

Eight screenshot calls spread across ~12 minutes on the same locked tab, against the repro page above. Confirmed by reading the files back with Read (which picks up actual disk content, not the chat-inline preview) and cross-checking with shasum:

# Wallclock Snapshot pageTitle PNG sha1 (first 7) PNG shows variant
1 11:13:31 A (pre-navigate) 768ab15 B
2 11:14:46 A 128af81 A
3 11:16:11 A 128af81 A
4 11:18:08 A 128af81 A
5 11:19:43 A 128af81 A
6 11:21:25 B 128af81 A :cross_mark:
7 11:23:15 A 768ab15 B :cross_mark:
8 11:25:08 A 128af81 A

Key things this table shows:

  • Captures 2–6 are byte-identical PNG files (same sha1), despite being taken 1m 40s apart. During that window the DOM’s variant flipped approximately ten times.
  • At iteration 6, the snapshot sees “Variant B” but the PNG is the still-cached “Variant A” file from iteration 2.
  • At iteration 7, the snapshot sees “Variant A” but the PNG file has finally updated to “Variant B” — i.e. the cache updated but is now behind in the other direction.
  • The screenshot tool is clearly not taking a fresh capture on every call; it’s returning a stale cached file most of the time, and sometimes that cached file disagrees with the live DOM (as reported by browser_snapshot and by subsequent browser_click results against the same refs, which always land correctly).

I’ve attached:

  • repro-variant-A.png — the sha1 128af81 file from iteration 2, byte-identical to the cached files at 3/4/5/6.
  • repro-variant-B.png — the sha1 768ab15 file from iteration 1, byte-identical to iteration 7.

Both files are generated entirely from the HTML repro above — no proprietary content.

Screenshots / Screen Recordings


Operating System

MacOS

Version Information

  • Cursor: 3.3.30 (macOS build)
  • OS: macOS 26.4.1 (25E253)
  • MCP: built-in cursor-ide-browser
  • Tool: browser_take_screenshot — descriptor: “Take a screenshot of the current page. You can’t perform actions based on the screenshot, use browser_snapshot for actions.”

Additional Information

Side observation: inline preview in agent chat can also be stale

Independently of the on-disk issue, the image preview returned inline in the tool result (rendered to the agent and user in chat) sometimes shows a different image from what’s actually on disk at the same filename. I observed cases where Reading the PNG back gave one image while the inline result preview had shown a different, older one. This is a separate caching layer and less critical to fix, but worth mentioning since it can mislead agents into thinking a screenshot matches the snapshot when it doesn’t.

Why I think this is a screenshot-tool bug

  • browser_snapshot is consistent with subsequent browser_click / browser_search results on the same tab — refs resolve, clicks land, text searches match. So the snapshot is live.
  • The on-disk PNG hash doesn’t change across many calls even as wall clock advances and the DOM cycles variants — so the screenshot tool is returning cached output rather than capturing a fresh frame per call.
  • When the cache does refresh, it sometimes refreshes to a variant that has already been replaced — so the capture pipeline is not a simple “live frame buffer” either.

Why it matters

“Snapshot to understand DOM, screenshot to show the user” is one of the most common agent workflows with this MCP. When the two disagree silently:

  • The agent reports the wrong UI state to the user (e.g. “here’s the fix working” with a screenshot that’s actually the old state).
  • Agents burn tokens in retry loops trying to reconcile snapshot and screenshot.
  • Teams end up documenting brittle workarounds (“always Read the PNG back and manually compare to the snapshot before trusting it”).

Suggestions

  • Make browser_take_screenshot always capture a fresh frame on every call, or at minimum invalidate any per-tab/per-viewId PNG cache before writing.
  • If caching is intentional for performance, expose a fresh: true argument so the agent can opt in to a forced re-capture.
  • Also relevant: the inline preview rendered in the tool result is sometimes a different image from the PNG written to disk under the returned filename. Serving the same bytes through both channels would be an easy consistency win.

Happy to provide more detail or a screen recording if helpful.

Does this stop you from using Cursor

No - Cursor works, but with this issue

Hi @elijahcarrelgrailbio,

This is a well-documented bug report, thank you. The shasum evidence and the minimal repro make this very clear.

We’ve confirmed the issue. We’re tracking this internally. The fix will likely involve switching to a capture method that forces a fresh composite before returning the PNG.

For now, the most reliable workaround is to trust browser_snapshot for state verification and treat browser_take_screenshot as a best-effort visual – which it sounds like you’re already doing.