Categories: FAANG

Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data

Self-training has been shown to be helpful in addressing data scarcity for many domains, including vision, speech, and language. Specifically, self-training, or pseudo-labeling, labels unsupervised data and adds that to the training pool. In this work, we investigate and use pseudo-labeling for a recently proposed novel setup: joint transcription and translation of speech, which suffers from an absence of sufficient parallel data resources. We show that under such data-deficient circumstances, the unlabeled data can significantly vary in domain from the supervised data, which results in…
AI Generated Robotic Content

Recent Posts

LTX-2.3 Water Sim LoRA flooding the Joker stairs (v2v test)

the joker stairs but it's a waterfall now 🌊 wide shots land clean, close-ups are…

58 mins ago

Toward More Controllable AI Video Editing: An Early Research Exploration at Netflix

By Zhuoning Yuan, Ta-Ying Cheng, Benjamin Klein, Bahareh AzarnoushIntroductionAt Netflix, we build technology to help…

59 mins ago

A Source of Mysterious Repeating Radio Signals From Space Has Been Identified

Researchers say the discovery could be a “Rosetta stone” for cosmic signals.

2 hours ago

Mouse moves unlock realistic AI video control with no extra computing cost

A technology developed at the Technion enables ordinary users to create realistic video clips intuitively,…

2 hours ago

The Ninja Slushi Is Only $200: Early Amazon Prime Day Deal 2026

Two years after it turned Marg Monday into a daily, the Ninja Slushi is only…

10 hours ago

Building Browser-Using AI Agents in Python

Most AI agent tutorials start with an API.

10 hours ago