Categories: FAANG

Is your conversational AI setting the right tone?

Conversational AI is too artificial

Nothing is more frustrating than calling a customer support line to be greeted by a monotone, robotic, automated voice. The voice on the other end of the phone is taking painfully long to read you the menu options. You’re two seconds away from either hanging up, screaming “representative” into the phone, or pounding on the zero button until you reach a human agent. That’s the problem with many IVR solutions today. Conversational AI is too artificial. Customers feel they’re not being heard or listened to, so they just want to speak with a human agent.

IBM Watson Expressive Voices

Luckily, there is a way to fix that problem and make the customer experience more pleasant. With IBM Watson’s newest technology of expressive voices, you will no longer feel like you’re talking to a typical robot; you’ll feel like you’re talking to a live human agent without any of the wait time. These highly natural voices have conversational capabilities like expressive styles, emotions, word emphasis and interjections. Not only do these voices relieve the customer frustration of feeling like they’re talking to a bot, but they also contribute to the goal of call deflection from human agents. It’s a win-win for customers and businesses.

Best suited for the customer care domain, the voices will have a conversational style enabled by default; however, the voices also support a neutral style which may be optimal for other use cases (newscasting, e-learning, audio books, etc.). Have a listen to the expressive voice samples below:

Emotions, Emphasis, Interjections

As humans, we convey emotion in the words we speak, whether we realize it or not. We tend to sound empathetic when apologizing to one another. We sound uncertain when we don’t know the answer to something, and perhaps cheerful when we finally discover the answer. The ability to convey emotion is what makes us human. IBM Watson’s expressive voices can express emotion in order to better convey the meaning behind the words, ultimately reducing customer frustration when dealing with today’s phone experiences. Your voice bot will sound empathetic when telling the customer their package is delayed or cheerful when they’ve successfully helped the customer book an airline ticket.

Emphasis is another important aspect of human speech. Did you say Austin or August? Did you say you lost the card ending in 4876? IBM expressive voices support word emphasis so that your bot can better convey the desired meaning of the text. Users can indicate the location of the stress with four levels – none, moderate, strong, and reduced.

Interjecting with words like hmm, um, oh, aha, or huh is another feature of human speech that IBM expressive voices now support to enable an interaction that feels more natural and human-like. The new expressive voices will automatically detect these interjections in text and treat them as such without any SSML (Speech Synthesis Markup Language) indication. There’s an also an option to disable the interjections when it’s not appropriate (e.g., ‘oh’ can be used to spell out the number 0 or as an interjection).

How to Get Started with Expressive Voices

Expressive voices and features will be available in US-English first in September 2022, followed by other languages in early 2023. The US-English expressive voices are Michael, Allison, Lisa, and Emma. For customers using the V3 version of Michael, Allison, or Lisa, switching to the expressive voices shouldn’t cause disruption as it will still sound like the same speaker, but with a more natural and conversational style. It’s easy to start using the new voices – simply indicate the voice name in the API reference, just like any other voice.

In summary, IBM’s new technology of expressive voices is the next level of conversational AI. It checks the box when it comes to an engaging and natural experience that mirrors that of a human agent. The new voices relieve the customer frustration of feeling unheard and drive call deflection from human agents. To learn more about the expressive voices, see the resources below.

The post Is your conversational AI setting the right tone? appeared first on Journey to AI Blog.

AI Generated Robotic Content

Recent Posts

Fine-tuning SDXL with childhood pictures → audio-reactive geometries – [Experiment]

After a deeply introspective and emotional journey, I fine-tuned SDXL using old family album pictures…

7 hours ago

Beyond Accuracy: 5 Metrics That Actually Matter for AI Agents

AI agents , or autonomous systems powered by agentic AI, have reshaped the current landscape…

7 hours ago

Apple Workshop on Reasoning and Planning 2025

Reasoning and planning are the bedrock of intelligent AI systems, enabling them to plan, interact,…

7 hours ago

MediaFM: The Multimodal AI Foundation for Media Understanding at Netflix

Avneesh Saluja, Santiago Castro, Bowei Yan, Ashish RastogiIntroductionNetflix’s core mission is to connect millions of members…

7 hours ago

Scaling data annotation using vision-language models to power physical AI systems

Critical labor shortages are constraining growth across manufacturing, logistics, construction, and agriculture. The problem is…

7 hours ago

Start Your Surround Sound Journey With $50 off This Klipsch Soundbar

This soundbar is just the beginning, with the option to add wireless bookshelf speakers or…

8 hours ago