Categories: FAANG

Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization

Recent advances in deep learning and automatic speech recognition have boosted the accuracy of end-to-end speech recognition to a new level. However, recognition of personal content such as contact names remains a challenge. In this work, we present a personalization solution for an end-to-end system based on connectionist temporal classification. Our solution uses class-based language model, in which a general language model provides modeling of the context for named entity classes, and personal named entities are compiled in a separate finite state transducer. We further introduce a…
AI Generated Robotic Content

Recent Posts

iPhone 2007 [FLUX.2 Klein]

A Lora trained on photos taken with the original Apple iPhone (2007). Works with FLUX.2…

7 hours ago

Building a ‘Human-in-the-Loop’ Approval Gate for Autonomous Agents

In agentic AI systems , when an agent's execution pipeline is intentionally halted, we have…

7 hours ago

ProText: A Benchmark Dataset for Measuring (Mis)gendering in Long-Form Texts

We introduce ProText, a dataset for measuring gendering and misgendering in stylistically diverse long-form English…

7 hours ago

Build reliable AI agents with Amazon Bedrock AgentCore Evaluations

Your AI agent worked in the demo, impressed stakeholders, handled test scenarios, and seemed ready…

7 hours ago

Robotaxi Outage in China Leaves Passengers Stranded on Highways

A suspected system failure froze Baidu’s robotaxis across Wuhan, trapping passengers and reportedly causing traffic…

8 hours ago

Chip-scale light technology could power faster AI and data center communications

Researchers at Trinity have developed a new light-based technology on a tiny chip that could…

8 hours ago