Categories: FAANG

Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data

Self-training has been shown to be helpful in addressing data scarcity for many domains, including vision, speech, and language. Specifically, self-training, or pseudo-labeling, labels unsupervised data and adds that to the training pool. In this work, we investigate and use pseudo-labeling for a recently proposed novel setup: joint transcription and translation of speech, which suffers from an absence of sufficient parallel data resources. We show that under such data-deficient circumstances, the unlabeled data can significantly vary in domain from the supervised data, which results in…
AI Generated Robotic Content

Recent Posts

Anima with dark style anime lora is pretty good. Tried with some Sailor girls.

Used Euler A and Beta 57 40 steps and 5 cfg. There might be some…

17 hours ago

The Roadmap for Mastering LLMOps in 2026

The LLMOps market is projected to grow from

17 hours ago

Reference your own AWS Secrets Manager secrets in Amazon Bedrock AgentCore Identity

AI agents are only as powerful as the tools they can access. Whether retrieving customer…

17 hours ago

How Trustpilot built a real-time architecture for data enrichment using Gemma

Processing millions of user reviews in real-time, under strict latency and cost constraints, is no…

17 hours ago

Anthropic Confidentially Files for What Could Be the Largest IPO Ever

The AI giant behind Claude submitted paperwork on Monday that would take it public, just…

18 hours ago

New 3D gaze forecasting could help AR devices render scenes before users look

Augmented reality (AR) devices like smart glasses may soon be able to predict where a…

18 hours ago