Picture 1ML 5216 6

Improve transcription accuracy of customer-agent calls with custom vocabulary in Amazon Transcribe

Many AWS customers have been successfully using Amazon Transcribe to accurately, efficiently, and automatically convert their customer audio conversations to text, and extract actionable insights from them. These insights can help you continuously enhance the processes and products that directly improve the quality and experience for your customers. In many countries, such as India, English …

Mel Spectrogram Inversion with Stable Pitch

Vocoders are models capable of transforming a low-dimensional spectral representation of an audio signal, typically the mel spectrogram, to a waveform. Modern speech generation pipelines use a vocoder as their final component. Recent vocoder models developed for speech achieve a high degree of realism, such that it is natural to wonder how they would perform …

GAUDI: A Neural Architect for Immersive 3D Scene Generation

We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera. We tackle this challenging problem with a scalable yet powerful approach, where we first optimize a latent representation that disentangles radiance fields and camera poses. This latent representation is …

image5

Digitizing Smell: Using Molecular Maps to Understand Odor

Posted by Richard C. Gerkin, Google Research, and Alexander B. Wiltschko, Google Did you ever try to measure a smell? …Until you can measure their likenesses and differences you can have no science of odor. If you are ambitious to found a new science, measure a smell. — Alexander Graham Bell, 1914. How can we …

ML 8724 image001

Detect audio events with Amazon Rekognition

When most people think of using machine learning (ML) with audio data, the use case that usually comes to mind is transcription, also known as speech-to-text. However, there are other useful applications, including using ML to detect sounds. Using software to detect a sound is called audio event detection, and it has a number of …

Model Teachers: Startups Make Schools Smarter With Machine Learning

Like two valedictorians, SimInsights and Photomath tell stories worth hearing about how AI is advancing education. SimInsights in Irvine, Calif., uses NVIDIA conversational AI to make virtual and augmented reality classes lifelike for college students and employee training. Photomath — founded in Zagreb, Croatia and based in San Mateo, Calif. — created an app using …

Improving Voice Trigger Detection with Metric Learning

Voice trigger detection is an important task, which enables activating a voice assistant when a target user speaks a keyword phrase. A detector is typically trained on speech data independent of speaker information and used for the voice trigger detection task. However, such a speaker independent voice trigger detector typically suffers from performance degradation on …

NeILF: Neural Incident Light Field for Material and Lighting Estimation

We present a differentiable rendering framework for material and lighting estimation from multi-view images and a reconstructed geometry. In the framework, we represent scene lightings as the Neural Incident Light Field (NeILF) and material properties as the surface BRDF modelled by multi-layer perceptrons. Compared with recent approaches that approximate scene lightings as the 2D environment …

Integrating Categorical Features in End-To-End ASR

All-neural, end-to-end ASR systems gained rapid interest from the speech recognition community. Such systems convert speech input to text units using a single trainable neural network model. E2E models require large amounts of paired speech text data that is expensive to obtain. The amount of data available varies across different languages and dialects. It is …

Combining Compressions for Multiplicative Size Scaling on Natural Language Tasks

Quantization, knowledge distillation, and magnitude pruning are among the most popular methods for neural network compression in NLP. Independently, these methods reduce model size and can accelerate inference, but their relative benefit and combinatorial inter- actions have not been rigorously studied. For each of the eight possible subsets of these techniques, we compare accuracy vs. …