SpeakStream: Streaming Text-to-Speech with Interleaved Data
With the increasing integration of speech front-ends and large language models (LLM), there is a need to explore architectures that integrate these modalities. While end-to-end models have been explored extensively, cascaded models that stream outputs from LLMs to TTS seem to be oddly under-explored, even though they are potentially much simpler. Using traditional text-to-speech systems …
Read more “SpeakStream: Streaming Text-to-Speech with Interleaved Data”