Categories: FAANG

Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data

Self-training has been shown to be helpful in addressing data scarcity for many domains, including vision, speech, and language. Specifically, self-training, or pseudo-labeling, labels unsupervised data and adds that to the training pool. In this work, we investigate and use pseudo-labeling for a recently proposed novel setup: joint transcription and translation of speech, which suffers from an absence of sufficient parallel data resources. We show that under such data-deficient circumstances, the unlabeled data can significantly vary in domain from the supervised data, which results in…
AI Generated Robotic Content

Recent Posts

No more Sora ..?

submitted by /u/Affectionate_Fee232 [link] [comments]

16 hours ago

Pentagon’s ‘Attempt to Cripple’ Anthropic Is Troubling, Judge Says

During a hearing Tuesday, a district court judge questioned the Department of Defense’s motivations for…

19 hours ago

Study finds AI privacy leaks hinge on a few high-impact neural network weights

Researchers have discovered that some of the elements of AI neural networks that contribute to…

19 hours ago

Beyond the Vector Store: Building the Full Data Layer for AI Applications

If you look at the architecture diagram of almost any AI startup today, you will…

19 hours ago

7 Steps to Mastering Memory in Agentic AI Systems

Memory is one of the most overlooked parts of agentic system design.

19 hours ago

Why Agents Fail: The Role of Seed Values and Temperature in Agentic Loops

In the modern AI landscape, an agent loop is a cyclic, repeatable, and continuous process…

19 hours ago