FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
This paper was accepted at the Workshop on Foundation Models in the Wild at ICLR 2025. Visual understanding is inherently contextual – what we focus on in an image depends on the task at hand. For instance, given an image of a person holding a bouquet of flowers, we may focus on either the person …
Read more “FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations”