Many AWS customers have been successfully using Amazon Transcribe to accurately, efficiently, and automatically convert their customer audio conversations to…
Vocoders are models capable of transforming a low-dimensional spectral representation of an audio signal, typically the mel spectrogram, to a…
We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be…
Posted by Richard C. Gerkin, Google Research, and Alexander B. Wiltschko, Google Did you ever try to measure a smell?…
When most people think of using machine learning (ML) with audio data, the use case that usually comes to mind…
Like two valedictorians, SimInsights and Photomath tell stories worth hearing about how AI is advancing education. SimInsights in Irvine, Calif.,…
Voice trigger detection is an important task, which enables activating a voice assistant when a target user speaks a keyword…
We present a differentiable rendering framework for material and lighting estimation from multi-view images and a reconstructed geometry. In the…
All-neural, end-to-end ASR systems gained rapid interest from the speech recognition community. Such systems convert speech input to text units…
Quantization, knowledge distillation, and magnitude pruning are among the most popular methods for neural network compression in NLP. Independently, these…