Categories: FAANG

Emphasis Control for Parallel Neural TTS

Recent parallel neural text-to-speech (TTS) synthesis methods are able to generate speech with high fidelity while maintaining high performance. However, these systems often lack control over the output prosody, thus restricting the semantic information conveyable for a given text. This paper proposes a hierarchical parallel neural TTS system for prosodic emphasis control by learning a latent space that directly corresponds to a change in emphasis. Three candidate features for the latent space are compared: 1) Variance of pitch and duration within words in a sentence, 2) Wavelet-based feature…
AI Generated Robotic Content

Recent Posts

AI, Light, and Shadow: Jasper’s New Research Powers More Realistic Imagery

Jasper Research Lab’s new shadow generation research and model enable brands to create more photorealistic…

11 hours ago

Gemini 2.0 is now available to everyone

We’re announcing new updates to Gemini 2.0 Flash, plus introducing Gemini 2.0 Flash-Lite and Gemini…

11 hours ago

Reinforcement Learning for Long-Horizon Interactive LLM Agents

Interactive digital agents (IDAs) leverage APIs of stateful digital environments to perform tasks in response…

11 hours ago

Trellix lowers cost, increases speed, and adds delivery flexibility with cost-effective and performant Amazon Nova Micro and Amazon Nova Lite models

This post is co-written with Martin Holste from Trellix.  Security teams are dealing with an…

11 hours ago

Designing sustainable AI: A deep dive into TPU efficiency and lifecycle emissions

As AI continues to unlock new opportunities for business growth and societal benefits, we’re working…

11 hours ago

NOAA Employees Told to Pause Work With ‘Foreign Nationals’

An internal email obtained by WIRED shows that NOAA workers received orders to pause “ALL…

12 hours ago