Categories: FAANG

Improving the Quality of Neural TTS Using Long-form Content and Multi-speaker Multi-style Modeling

Neural text-to-speech (TTS) can provide quality close to natural speech if an adequate amount of high-quality speech material is available for training. However, acquiring speech data for TTS training is costly and time-consuming, especially if the goal is to generate different speaking styles. In this work, we show that we can transfer speaking style across speakers and improve the quality of synthetic speech by training a multi-speaker multi-style (MSMS) model with long-form recordings, in addition to regular TTS recordings. In particular, we show that 1) multi-speaker modeling improves the…
AI Generated Robotic Content

Recent Posts

11 Best Beard Trimmers (2024): Full Beards, Hair, Stubble

These beard tools deliver a quality trim for all types of facial hair.

17 hours ago

5 of the Most Influential Machine Learning Papers of 2024

Artificial intelligence (AI) research, particularly in the machine learning (ML) domain, continues to increase the…

2 days ago

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

Training large language models (LLMs) models has become a significant expense for businesses. For many…

2 days ago

OpenAI’s o3 shows remarkable progress on ARC-AGI, sparking debate on AI reasoning

o3 solved one of the most difficult AI challenges, scoring 75.7% on the ARC-AGI benchmark.…

2 days ago

How NASA Might Change Under Donald Trump

The Trump transition team is looking for “big changes” at NASA—including some cuts.

2 days ago

An AI system has reached human level on a test for ‘general intelligence’—here’s what that means

A new artificial intelligence (AI) model has just achieved human-level results on a test designed…

2 days ago