Categories: FAANG

Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation

The neural transducer is an end-to-end model for automatic speech recognition (ASR). While the model is well-suited for streaming ASR, the training process remains challenging. During training, the memory requirements may quickly exceed the capacity of state-of-the-art GPUs, limiting batch size and sequence lengths. In this work, we analyze the time and space complexity of a typical transducer training setup. We propose a memory-efficient training method that computes the transducer loss and gradients sample by sample. We present optimizations to increase the efficiency and parallelism of the…
AI Generated Robotic Content

Recent Posts

SenseNova-U1 just dropped — native multimodal gen/understanding in one model, no VAE, no diffusion

What's new: Text rendering in images actually works. Diffusion models scramble text because they don't…

21 hours ago

Adaptive Thinking: Large Language Models Know When to Think in Latent Space

Recent advances in large language models (LLMs) test-time computing have introduced the capability to perform…

21 hours ago

Extracting contract insights with PwC’s AI-driven annotation on AWS

This post was co-written with Yash Munsadwala, Adam Hood, Justin Guse, and Hector Hernandez from…

21 hours ago

The founder’s AI foundation: The top announcements for startups from Next ‘26

The momentum is undeniable: the world’s fastest-growing AI startups are building with Google Cloud. Instead…

21 hours ago

How Elon Musk Squeezed OpenAI: They ‘Are Gonna Want to Kill Me’

Tensions flared on the third day of trial in Musk v. Altman as OpenAI’s lawyers…

22 hours ago