Categories: FAANG

Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation

The neural transducer is an end-to-end model for automatic speech recognition (ASR). While the model is well-suited for streaming ASR, the training process remains challenging. During training, the memory requirements may quickly exceed the capacity of state-of-the-art GPUs, limiting batch size and sequence lengths. In this work, we analyze the time and space complexity of a typical transducer training setup. We propose a memory-efficient training method that computes the transducer loss and gradients sample by sample. We present optimizations to increase the efficiency and parallelism of the…
AI Generated Robotic Content

Recent Posts

Flux Kontext is great changing titles

Flux Kontext can change a poster title/text while keeping the font and style. It's really…

23 hours ago

Linear Layers and Activation Functions in Transformer Models

This post is divided into three parts; they are: • Why Linear Layers and Activations…

23 hours ago

LayerNorm and RMS Norm in Transformer Models

This post is divided into five parts; they are: • Why Normalization is Needed in…

23 hours ago

From R&D to Real-World Impact

Palantir’s Advice for the White House OSTP’s AI R&D PlanEditor’s Note: This blog post highlights Palantir’s…

23 hours ago

Build and deploy AI inference workflows with new enhancements to the Amazon SageMaker Python SDK

Amazon SageMaker Inference has been a popular tool for deploying advanced machine learning (ML) and…

23 hours ago

How to build Web3 AI agents with Google Cloud

For over two decades, Google has been a pioneer in AI, conducting groundwork that has…

23 hours ago