Categories: FAANG

Emphasis Control for Parallel Neural TTS

Recent parallel neural text-to-speech (TTS) synthesis methods are able to generate speech with high fidelity while maintaining high performance. However, these systems often lack control over the output prosody, thus restricting the semantic information conveyable for a given text. This paper proposes a hierarchical parallel neural TTS system for prosodic emphasis control by learning a latent space that directly corresponds to a change in emphasis. Three candidate features for the latent space are compared: 1) Variance of pitch and duration within words in a sentence, 2) Wavelet-based feature…
AI Generated Robotic Content

Recent Posts

Revolutionizing Construction

How Cavanagh and Palantir Are Building Construction’s OS for the 21st CenturyEditor’s Note: This blog post…

24 hours ago

Building a voice-driven AWS assistant with Amazon Nova Sonic

As cloud infrastructure becomes increasingly complex, the need for intuitive and efficient management interfaces has…

24 hours ago

Cloud CISO Perspectives: Our 2026 Cybersecurity Forecast report

Welcome to the first Cloud CISO Perspectives for December 2025. Today, Francis deSouza, COO and…

24 hours ago

As AI Grows More Complex, Model Builders Rely on NVIDIA

Unveiling what it describes as the most capable model series yet for professional knowledge work,…

24 hours ago

How Harmonic Security improved their data-leakage detection system with low-latency fine-tuned models using Amazon SageMaker, Amazon Bedrock, and Amazon Nova Pro

This post was written with Bryan Woolgar-O’Neil, Jamie Cockrill and Adrian Cunliffe from Harmonic Security…

2 days ago