Auto-add images to context or in other ways let models load images for vision

vincenzoml · April 29, 2025, 6:03pm

If I add a png to the chat context by dragging it or @-mentioning it, vision-capable models are able to read it and do wonderful things, but if I just ask a model in agent mode to read the png and report about it, it can’t and starts trying to create ocr programs in python and such… Is it possible to let models just use their vision automatically in YOLO mode? Would be really amazing.

cshape · May 9, 2025, 5:58am

+1 this would be great.

Topic		Replies	Views
Can cursor read image when i tag a picture in the context? How To	3	210	March 13, 2025
Agents should be able to view images in chat and add it to their own prompt Feature Requests	0	21	March 31, 2025
Trying to submit images without a vision-enabled model selected Discussions	4	2561	May 29, 2025
"see what you've done" - multimodal yolo mode composer agent How To	12	364	May 28, 2025
Vision for gemini models through api? Discussions	5	102	February 11, 2025

Auto-add images to context or in other ways let models load images for vision

Related topics