Categories: FAANG

Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation

The neural transducer is an end-to-end model for automatic speech recognition (ASR). While the model is well-suited for streaming ASR, the training process remains challenging. During training, the memory requirements may quickly exceed the capacity of state-of-the-art GPUs, limiting batch size and sequence lengths. In this work, we analyze the time and space complexity of a typical transducer training setup. We propose a memory-efficient training method that computes the transducer loss and gradients sample by sample. We present optimizations to increase the efficiency and parallelism of the…
AI Generated Robotic Content

Recent Posts

New changes at CivitAI

submitted by /u/Enshitification [link] [comments]

3 hours ago

A Theoretical Framework for Acoustic Neighbor Embeddings

This paper provides a theoretical framework for interpreting acoustic neighbor embeddings, which are representations of…

3 hours ago

Understanding Amazon Bedrock model lifecycle

Amazon Bedrock regularly releases new foundation model (FM) versions with better capabilities, accuracy, and safety.…

3 hours ago

Guardrails at the gateway: Securing AI inference on GKE with Model Armor

Enterprises are rapidly moving AI workloads from experimentation to production on Google Kubernetes Engine (GKE),…

3 hours ago

OpenAI Backs Bill That Would Limit Liability for AI-Enabled Mass Deaths or Financial Disasters

The ChatGPT-maker testified in favor of an Illinois bill that would limit when AI labs…

4 hours ago