Read images from filesystem

Hi :waving_hand:

I use multiple MCP servers, one of them is to manipulate a Chrome instance.
It exposes a tool to take screenshots.

In Agent mode, this tool is being called to analyze the current state of the page. The issue is that once the screenshot is taken, the Agent doesn’t actually use it. I believe it’s because to pass it back to the LLM, it needs to “attach” it to the chat and doesn’t know how to do it.

See example of the Agent recognizing it should take a screenshot but not using it:

I confirmed that by opening a new chat and asking an Agent: what’s in the image located at path “/tmp/…” and it couldn’t do it.
However, if I “attach” the image to the chat it can successfully describe it.

Is there a way to have tools that write images “cooperate” with the way we pass images back to the LLM for analysis?

Thanks !

1 Like

Did you find any workarounds ?