Categories: FAANG

Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation

The neural transducer is an end-to-end model for automatic speech recognition (ASR). While the model is well-suited for streaming ASR, the training process remains challenging. During training, the memory requirements may quickly exceed the capacity of state-of-the-art GPUs, limiting batch size and sequence lengths. In this work, we analyze the time and space complexity of a typical transducer training setup. We propose a memory-efficient training method that computes the transducer loss and gradients sample by sample. We present optimizations to increase the efficiency and parallelism of the…
AI Generated Robotic Content

Recent Posts

I made a full music video with Wan2.2 featuring my AI artist

Workflow is just regular Wan2.2 fp8 6 steps (2 steps high noise, 4 steps low),…

8 hours ago

5 Essential Python Scripts for Intermediate Machine Learning Practitioners

As a machine learning engineer, you probably enjoy working on interesting tasks like experimenting with…

8 hours ago

Expanding support for AI developers on Hugging Face

For those building with AI, most are in it to change the world — not…

8 hours ago

Baidu unveils proprietary ERNIE 5 beating GPT-5 performance on charts, document understanding and more

Mere hours after OpenAI updated its flagship foundation model GPT-5 to GPT-5.1, promising reduced token…

9 hours ago

Robots trained with spatial dataset show improved object handling and awareness

When it comes to navigating their surroundings, machines have a natural disadvantage compared to humans.…

9 hours ago

Having Fun with Ai

submitted by /u/Artefact_Design [link] [comments]

1 day ago