Categories: FAANG

FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations

This paper was accepted at the Workshop on Foundation Models in the Wild at ICLR 2025.
Visual understanding is inherently contextual – what we focus on in an image depends on the task at hand. For instance, given an image of a person holding a bouquet of flowers, we may focus on either the person such as their clothing, or the type of flowers, depending on the context of interest. Yet, most existing image encoding paradigms represent an image as a fixed, generic feature vector, overlooking the potential needs of prioritizing varying visual information for different downstream use cases. In…
AI Generated Robotic Content

Recent Posts

LTX-2.3 Water Sim LoRA flooding the Joker stairs (v2v test)

the joker stairs but it's a waterfall now 🌊 wide shots land clean, close-ups are…

8 hours ago

Toward More Controllable AI Video Editing: An Early Research Exploration at Netflix

By Zhuoning Yuan, Ta-Ying Cheng, Benjamin Klein, Bahareh AzarnoushIntroductionAt Netflix, we build technology to help…

8 hours ago

A Source of Mysterious Repeating Radio Signals From Space Has Been Identified

Researchers say the discovery could be a “Rosetta stone” for cosmic signals.

9 hours ago

Mouse moves unlock realistic AI video control with no extra computing cost

A technology developed at the Technion enables ordinary users to create realistic video clips intuitively,…

9 hours ago

The Ninja Slushi Is Only $200: Early Amazon Prime Day Deal 2026

Two years after it turned Marg Monday into a daily, the Ninja Slushi is only…

17 hours ago

Building Browser-Using AI Agents in Python

Most AI agent tutorials start with an API.

17 hours ago