Categories: FAANG

Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation

The neural transducer is an end-to-end model for automatic speech recognition (ASR). While the model is well-suited for streaming ASR, the training process remains challenging. During training, the memory requirements may quickly exceed the capacity of state-of-the-art GPUs, limiting batch size and sequence lengths. In this work, we analyze the time and space complexity of a typical transducer training setup. We propose a memory-efficient training method that computes the transducer loss and gradients sample by sample. We present optimizations to increase the efficiency and parallelism of the…
AI Generated Robotic Content

Recent Posts

After ~400 Z-Image Turbo gens I finally figured out why everyone’s portraits look plastic

Been using Z-Image Turbo pretty heavily since it dropped and wanted to dump some notes…

4 hours ago

Evaluating Netflix Show Synopses with LLM-as-a-Judge

by Gabriela Alessio, Cameron Taylor, and Cameron R. WolfeIntroductionWhen members log into Netflix, one of the…

4 hours ago

How SAP Concur automates expense reporting with agentic AI

For decades, expense automation relied on a simple premise: If the machine can read the…

4 hours ago

Artemis II Returns From Historic Flight Around the Moon

After traveling a greater distance from Earth than any humans before them, the astronauts of…

5 hours ago