Categories: FAANG

Referring to Screen Texts with Voice Assistants

Voice assistants help users make phone calls, send messages, create events, navigate, and do a lot more. However, assistants have limited capacity to understand their users’ context. In this work, we aim to take a step in this direction. Our work dives into a new experience for users to refer to phone numbers, addresses, email addresses, URLs, and dates on their phone screens. Our focus lies in reference understanding, which becomes particularly interesting when multiple similar texts are present on screen, similar to visual grounding. We collect a dataset and propose a lightweight…
AI Generated Robotic Content

Recent Posts

I made a full music video with Wan2.2 featuring my AI artist

Workflow is just regular Wan2.2 fp8 6 steps (2 steps high noise, 4 steps low),…

4 hours ago

5 Essential Python Scripts for Intermediate Machine Learning Practitioners

As a machine learning engineer, you probably enjoy working on interesting tasks like experimenting with…

4 hours ago

Expanding support for AI developers on Hugging Face

For those building with AI, most are in it to change the world — not…

4 hours ago

Baidu unveils proprietary ERNIE 5 beating GPT-5 performance on charts, document understanding and more

Mere hours after OpenAI updated its flagship foundation model GPT-5 to GPT-5.1, promising reduced token…

5 hours ago

Robots trained with spatial dataset show improved object handling and awareness

When it comes to navigating their surroundings, machines have a natural disadvantage compared to humans.…

5 hours ago

Having Fun with Ai

submitted by /u/Artefact_Design [link] [comments]

1 day ago