Categories: FAANG

Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization

Recent advances in deep learning and automatic speech recognition have boosted the accuracy of end-to-end speech recognition to a new level. However, recognition of personal content such as contact names remains a challenge. In this work, we present a personalization solution for an end-to-end system based on connectionist temporal classification. Our solution uses class-based language model, in which a general language model provides modeling of the context for named entity classes, and personal named entities are compiled in a separate finite state transducer. We further introduce a…
AI Generated Robotic Content

Recent Posts

Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation

We just released RadialAttention, a sparse attention mechanism with O(nlog⁡n) computational complexity for long video…

8 hours ago

Mixture of Experts Architecture in Transformer Models

This post covers three main areas: • Why Mixture of Experts is Needed in Transformers…

8 hours ago

Your First Local LLM API Project in Python Step-By-Step

Interested in leveraging a large language model (LLM) API locally on your machine using Python…

8 hours ago

Use Amazon SageMaker Unified Studio to build complex AI workflows using Amazon Bedrock Flows

Organizations face the challenge to manage data, multiple artificial intelligence and machine learning (AI/ML) tools,…

8 hours ago

Capital One builds agentic AI modeled after its own org chart to supercharge auto sales

Capital One's head of AI foundations explained at VB Transform on how the bank patterned…

9 hours ago

A Pro-Russia Disinformation Campaign Is Using Free AI Tools to Fuel a ‘Content Explosion’

Consumer-grade AI tools have supercharged Russian-aligned disinformation as pictures, videos, QR codes, and fake websites…

9 hours ago