Categories: FAANG

Improvements to Embedding-Matching Acoustic-to-Word ASR Using Multiple-Hypothesis Pronunciation-Based Embeddings

In embedding-matching acoustic-to-word (A2W) ASR, every word in the vocabulary is represented by a fixed-dimension embedding vector that can be added or removed independently of the rest of the system. The approach is potentially an elegant solution for the dynamic out-of-vocabulary (OOV) words problem, where speaker- and context-dependent named entities like contact names must be incorporated into the ASR on-the-fly for every speech utterance at testing time. Challenges still remain, however, in improving the overall accuracy of embedding-matching A2W. In this paper, we contribute two methods…
AI Generated Robotic Content

Recent Posts

A Complete Guide to Matrices for Machine Learning with Python

Matrices are a key concept not only in linear algebra but also with regard to…

2 hours ago

An Efficient and Streaming Audio Visual Active Speaker Detection System

This paper delves into the challenging task of Active Speaker Detection (ASD), where the system…

2 hours ago

Benchmarking Amazon Nova and GPT-4o models with FloTorch

Based on original post by Dr. Hemant Joshi, CTO, FloTorch.ai A recent evaluation conducted by…

2 hours ago

How Google Cloud measures its climate impact through Life Cycle Assessment (LCA)

As AI creates opportunities for business growth and societal benefits, we’re working to reduce their…

2 hours ago

Sony testing AI to drive PlayStation characters

PlayStation characters may one day engage you in theoretically endless conversations, if a new internal…

3 hours ago

15-inch MacBook Air (M4, 2025) Review: Bluer and Better

The latest 15-inch MacBook Air is bluer and better than ever before—and it dropped in…

3 hours ago