Integrating Categorical Features in End-To-End ASR
All-neural, end-to-end ASR systems gained rapid interest from the speech recognition community. Such systems convert speech input to text units using a single trainable neural network model. E2E models require large amounts of paired speech text data that is expensive to obtain. The amount of data available varies across different languages and dialects. It is …
Read more “Integrating Categorical Features in End-To-End ASR”