Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation
The neural transducer is an end-to-end model for automatic speech recognition (ASR). While the model is well-suited for streaming ASR, the training process remains challenging. During training, the memory requirements may quickly exceed the capacity of state-of-the-art GPUs, limiting batch size and sequence lengths. In this work, we analyze the time and space complexity of …
Read more “Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation”