Categories: FAANG

Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data

Self-training has been shown to be helpful in addressing data scarcity for many domains, including vision, speech, and language. Specifically, self-training, or pseudo-labeling, labels unsupervised data and adds that to the training pool. In this work, we investigate and use pseudo-labeling for a recently proposed novel setup: joint transcription and translation of speech, which suffers from an absence of sufficient parallel data resources. We show that under such data-deficient circumstances, the unlabeled data can significantly vary in domain from the supervised data, which results in…
AI Generated Robotic Content

Recent Posts

Just tried HunyuanImage 2.1

Hey guys, I just tested out the new HunyuanImage 2.1 model on HF and… wow.…

5 hours ago

Multi-Agent Systems: The Next Frontier in AI-Driven Cyber Defense

The increasing sophistication of cyber threats calls for a systemic change in the way we…

5 hours ago

ROC AUC vs Precision-Recall for Imbalanced Data

When building machine learning models to classify imbalanced data — i.

5 hours ago

7 Scikit-learn Tricks for Optimized Cross-Validation

Validating machine learning models requires careful testing on unseen data to ensure robust, unbiased estimates…

5 hours ago

Powering innovation at scale: How AWS is tackling AI infrastructure challenges

As generative AI continues to transform how enterprises operate—and develop net new innovations—the infrastructure demands…

5 hours ago

Introducing the Agentic SOC Workshops for security professionals

The security operations centers of the future will use agentic AI to enable intelligent automation…

5 hours ago