Textualization of Visual Information

A simple attempt to textualize images via vision experts and evaluate on MMBench.

Poster image to be added

Description

Visual information (image labels, image captions, object labels, etc.) contains the humans’ (or visual model's) understanding and analysis of key features in the image. We investigate the performance of LLMs in directly accomplishing visual perception/reasoning tasks by transforming visual information in images into text via vision experts. We follow LENS framework and evaluate on MMBench and provide some insights.

Project Members