Gemini-2.5-pro can't see images returned by MCP servers

:white_check_mark: Check the forum to ensure the issue hasn’t been reported already To be checked by the user. → Done.

:lady_beetle: Provide a clear description of the bug Gemini 2.5 Pro, via Cursor, does not seem to correctly process images returned by an MCP (Multi-Content Prompt) server. The model appears to “hallucinate” the presence of the image, when in reality it might only be processing a textual placeholder (e.g., “image”). This behavior contrasts with that of other models (e.g., Claude 3.5 Sonnet, Claude 3.7 Sonnet) which correctly interpret the same images via MCP.

:counterclockwise_arrows_button: Explain how to reproduce the bug (if known)

  1. Set up a request to an MCP server (either local/home-built or public/validated) that returns an image.
  2. Submit this request to Gemini 2.5 Pro via Cursor.
  3. Observe the model’s response: it will likely indicate that it sees the image, but its subsequent descriptions or analyses will suggest that it does not have access to the actual visual content.

:camera: Attach screenshots or recordings (e.g., .jpg , .png , .mp4 ).

With Gemini2.5 Pro


With Claude Sonnet 3.5

:laptop: Tell us your operating system and your Cursor version (e.g., Windows, 0.x.x ).

  • Operating systems tested: Windows (primarily) and Linux (via WSL).
  • Cursor version: 0.50.1

:prohibited: Tell us if the issue stops you from using Cursor. This bug significantly impacts the use of Gemini 2.5 Pro in Cursor for all tasks involving image analysis via MCP servers, thereby limiting the expected multimodal capabilities.