Vocoders are models capable of transforming a low-dimensional spectral representation of an audio signal, typically the mel spectrogram, to a…
We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be…
Posted by Richard C. Gerkin, Google Research, and Alexander B. Wiltschko, Google Did you ever try to measure a smell?…
When most people think of using machine learning (ML) with audio data, the use case that usually comes to mind…
Like two valedictorians, SimInsights and Photomath tell stories worth hearing about how AI is advancing education. SimInsights in Irvine, Calif.,…
Voice trigger detection is an important task, which enables activating a voice assistant when a target user speaks a keyword…
We present a differentiable rendering framework for material and lighting estimation from multi-view images and a reconstructed geometry. In the…
All-neural, end-to-end ASR systems gained rapid interest from the speech recognition community. Such systems convert speech input to text units…
Quantization, knowledge distillation, and magnitude pruning are among the most popular methods for neural network compression in NLP. Independently, these…
We introduce CVNets, a high-performance open-source library for training deep neural networks for visual recognition tasks, including classification, detection, and…