Categories: FAANG

Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization

Recent advances in deep learning and automatic speech recognition have boosted the accuracy of end-to-end speech recognition to a new level. However, recognition of personal content such as contact names remains a challenge. In this work, we present a personalization solution for an end-to-end system based on connectionist temporal classification. Our solution uses class-based language model, in which a general language model provides modeling of the context for named entity classes, and personal named entities are compiled in a separate finite state transducer. We further introduce a…
AI Generated Robotic Content

Recent Posts

Trying to make audio-reactive videos with wan 2.2

submitted by /u/Fill_Espectro [link] [comments]

14 hours ago

3 Ways to Speed Up Model Training Without More GPUs

In this article, you will learn three proven ways to speed up model training by…

14 hours ago

7 Feature Engineering Tricks for Text Data

An increasing number of AI and machine learning-based systems feed on text data — language…

14 hours ago

Bringing AI to the next generation of fusion energy

We’re partnering with Commonwealth Fusion Systems (CFS) to bring clean, safe, limitless fusion energy closer…

14 hours ago

Training Software Engineering Agents and Verifiers with SWE-Gym

We present SWE-Gym, the first environment for training real-world software engineering (SWE) agents. SWE-Gym contains…

14 hours ago

Iterative fine-tuning on Amazon Bedrock for strategic model improvement

Organizations often face challenges when implementing single-shot fine-tuning approaches for their generative AI models. The…

14 hours ago