image2

High-Definition Segmentation in Google Meet

Posted by Tingbo Hou and Juhyun Lee, Software Engineers, Google In recent years video conferencing has played an increasingly important role in both work and personal communication for many users. Over the past two years, we have enhanced this experience in Google Meet by introducing privacy-preserving machine learning (ML) powered background features, also known as …

image4

Using ML to Boost Engagement with a Maternal and Child Health Program in India

Posted by Aparna Taneja, Software Engineer, and Milind Tambe, Principal Scientist, Google Research, India Research Lab The widespread availability of mobile phones has enabled non-profits to deliver critical health information to their beneficiaries in a timely manner. While advanced applications on smartphones allow for richer multimedia content and two-way communication between beneficiaries and health coaches, …

image4 1

UVQ: Measuring YouTube’s Perceptual Video Quality

Posted by Yilin Wang, Staff Software Engineer, YouTube and Feng Yang, Senior Staff Software Engineer, Google Research Online video sharing platforms, like YouTube, need to understand perceptual video quality (i.e., a user’s subjective perception of video quality) in order to better optimize and improve user experience. Video quality assessment (VQA) attempts to build a bridge …

image4

OptFormer: Towards Universal Hyperparameter Optimization with Transformers

Posted by Yutian Chen, Staff Research Scientist, DeepMind, and Xingyou (Richard) Song, Research Scientist, Google Research, Brain Team One of the most important aspects in machine learning is hyperparameter optimization, as finding the right hyperparameters for a machine learning task can make or break a model’s performance. Internally, we regularly use Google Vizier as the …

image3 1

Towards Helpful Robots: Grounding Language in Robotic Affordances

Posted by Brian Ichter and Karol Hausman, Research Scientists, Google Research, Brain Team Over the last several years, we have seen significant progress in applying machine learning to robotics. However, robotic systems today are capable of executing only very short, hard-coded commands, such as “Pick up an apple,” because they tend to perform best with …

image3

Rax: Composable Learning-to-Rank Using JAX

Posted by Rolf Jagerman and Honglei Zhuang, Software Engineers, Google Research Ranking is a core problem across a variety of domains, such as search engines, recommendation systems, or question answering. As such, researchers often utilize learning-to-rank (LTR), a set of supervised machine learning techniques that optimize for the utility of an entire list of items …

image1 2

Efficient Video-Text Learning with Iterative Co-tokenization

Posted by AJ Piergiovanni and Anelia Angelova, Research Scientists, Google Research, Brain Team Video is an ubiquitous source of media content that touches on many aspects of people’s day-to-day lives. Increasingly, real-world video applications, such as video captioning, video content analysis, and video question-answering (VideoQA), rely on models that can connect video content with text …

image4

Introducing the Google Universal Image Embedding Challenge

Posted by Bingyi Cao, Software Engineer, Google Research, and Mário Lipovský, Software Engineer, Google Lens Computer vision models see daily application for a wide variety of tasks, ranging from object recognition to image-based 3D object reconstruction. One challenging type of computer vision problem is instance-level recognition (ILR) — given an image of an object, the …

mpnasimage5

Building Efficient Multiple Visual Domain Models with Multi-path Neural Architecture Search

Posted by Qifei Wang, Senior Software Engineer, and Feng Yang, Senior Staff Software Engineer, Google Research Deep learning models for visual tasks (e.g., image classification) are usually trained end-to-end with data from a single visual domain (e.g., natural images or computer generated images). Typically, an application that completes visual tasks for multiple domains would need …

image1

Efficient Sequence Modeling for On-Device ML

Posted by Arun Kandoor, Software Engineer, Google Research The increasing demand for machine learning (ML) model inference on-device (for mobile devices, tablets, etc.) is driven by the rise of compute-intensive applications, the need to keep certain data on device for privacy and security reasons, and the desire to provide services when a network connection may …