Categories: FAANG

Referring to Screen Texts with Voice Assistants

Voice assistants help users make phone calls, send messages, create events, navigate, and do a lot more. However, assistants have limited capacity to understand their users’ context. In this work, we aim to take a step in this direction. Our work dives into a new experience for users to refer to phone numbers, addresses, email addresses, URLs, and dates on their phone screens. Our focus lies in reference understanding, which becomes particularly interesting when multiple similar texts are present on screen, similar to visual grounding. We collect a dataset and propose a lightweight…
AI Generated Robotic Content

Recent Posts

LTX-2.3 Water Sim LoRA flooding the Joker stairs (v2v test)

the joker stairs but it's a waterfall now 🌊 wide shots land clean, close-ups are…

3 hours ago

Toward More Controllable AI Video Editing: An Early Research Exploration at Netflix

By Zhuoning Yuan, Ta-Ying Cheng, Benjamin Klein, Bahareh AzarnoushIntroductionAt Netflix, we build technology to help…

3 hours ago

A Source of Mysterious Repeating Radio Signals From Space Has Been Identified

Researchers say the discovery could be a “Rosetta stone” for cosmic signals.

4 hours ago

Mouse moves unlock realistic AI video control with no extra computing cost

A technology developed at the Technion enables ordinary users to create realistic video clips intuitively,…

4 hours ago

The Ninja Slushi Is Only $200: Early Amazon Prime Day Deal 2026

Two years after it turned Marg Monday into a daily, the Ninja Slushi is only…

12 hours ago

Building Browser-Using AI Agents in Python

Most AI agent tutorials start with an API.

12 hours ago