Categories: FAANG

FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations

This paper was accepted at the Workshop on Foundation Models in the Wild at ICLR 2025.
Visual understanding is inherently contextual – what we focus on in an image depends on the task at hand. For instance, given an image of a person holding a bouquet of flowers, we may focus on either the person such as their clothing, or the type of flowers, depending on the context of interest. Yet, most existing image encoding paradigms represent an image as a fixed, generic feature vector, overlooking the potential needs of prioritizing varying visual information for different downstream use cases. In…
AI Generated Robotic Content

Recent Posts

Future of AI image generators

Listen. I honestly don’t know whether this is just coincidence, a deliberate decision, or simply…

16 mins ago

Implementing Prompt Compression to Reduce Agentic Loop Costs

Agentic loops in production can be synonymous with high costs, especially when it comes to…

16 mins ago

Building web search-enabled agents with Strands and Exa

This post is co written by Ishan Goswami and Nitya Sridhar from Exa. If you…

16 mins ago

Cloud Storage Rapid: Turbocharged object storage for AI and analytics

At Google Cloud Next ’26 we announced Cloud Storage Rapid, a family of object storage…

16 mins ago

Ilya Sutskever Stands by His Role in Sam Altman’s OpenAI Ouster: ‘I Didn’t Want It to Be Destroyed’

The former OpenAI chief scientist may be estranged from the company, but he still came…

1 hour ago

People struggle to recall whether content came from AI, with labels forgotten after one week

From August 2026, an EU-wide AI regulation will come into force requiring the labeling of…

1 hour ago