Categories: FAANG

Referring to Screen Texts with Voice Assistants

Voice assistants help users make phone calls, send messages, create events, navigate, and do a lot more. However, assistants have limited capacity to understand their users’ context. In this work, we aim to take a step in this direction. Our work dives into a new experience for users to refer to phone numbers, addresses, email addresses, URLs, and dates on their phone screens. Our focus lies in reference understanding, which becomes particularly interesting when multiple similar texts are present on screen, similar to visual grounding. We collect a dataset and propose a lightweight…
AI Generated Robotic Content

Recent Posts

Wan 2.2 human image generation is very good. This open model has a great future.

submitted by /u/yomasexbomb [link] [comments]

16 hours ago

Your First Containerized Machine Learning Deployment with Docker and FastAPI

Deploying machine learning models can seem complex, but modern tools can streamline the process.

16 hours ago

Mistral-Small-3.2-24B-Instruct-2506 is now available on Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

Today, we’re excited to announce that Mistral-Small-3.2-24B-Instruct-2506—a 24-billion-parameter large language model (LLM) from Mistral AI…

16 hours ago

AI vs. AI: Prophet Security raises $30M to replace human analysts with autonomous defenders

Prophet Security raises $30 million to launch a fully autonomous AI cybersecurity platform that investigates…

17 hours ago

To explore AI bias, researchers pose a question: How do you imagine a tree?

To confront bias, scientists say we must examine the ontological frameworks within large language models—and…

17 hours ago

Be honest: How realistic is my new vintage AI lora?

No workflow since it's only a WIP lora. submitted by /u/I_SHOOT_FRAMES [link] [comments]

2 days ago