Overview:
Currently, agents can only analyze images that I manually include in prompts. When an agent independently searches for an image file within our project, it is unable to analyze its contents.
Proposed Feature:
Allow agents to autonomously analyze images they encounter. This capability would enable the agent to perform actions based on the visual data it observes without manual intervention.
Use Case Examples:
- Automated Screenshot Analysis:
With browser-tools MCP, I automatically capture screenshots of my hot-reloaded Next.js app as the agent implements changes. If the agent could autonomously analyze these screenshots, I could simply instruct it to evaluate the visual output and make necessary modifications without manually providing the latest screenshot after every update. - Iterative Image Generation:
Consider an AI image generation MCP that creates a background image for a website. If the agent could analyze the returned image, it could automatically refine its prompts to the MCP. This would enable it to iterate through image generations until achieving the optimal visual result.
Benefits:
- Streamlines workflows by eliminating the need for manual image input.
- Enhances automation capabilities, allowing agents to react dynamically to visual data.
- Opens up a wide range of use cases, from responsive UI adjustments to creative iterative design processes.
I believe adding this feature would significantly enhance the agentās versatility and efficiency in handling visual content.