Categories: FAANG

How Clarus Care uses Amazon Bedrock to deliver conversational contact center interactions

ml 193561

This post was cowritten by Rishi Srivastava and Scott Reynolds from Clarus Care.

Many healthcare practices today struggle with managing high volumes of patient calls efficiently. From appointment scheduling and prescription refills to billing inquiries and urgent medical concerns, practices face the challenge of providing timely responses while maintaining quality patient care. Traditional phone systems often lead to long hold times, frustrated patients, and overwhelmed staff who manually process and prioritize hundreds of calls daily. These communication bottlenecks not only impact patient satisfaction but can also delay critical care coordination.

In this post, we illustrate how Clarus Care, a healthcare contact center solutions provider, worked with the AWS Generative AI Innovation Center (GenAIIC) team to develop a generative AI-powered contact center prototype. This solution enables conversational interaction and multi-intent resolution through an automated voicebot and chat interface. It also incorporates a scalable service model to support growth, human transfer capabilities–when requested or for urgent cases–and an analytics pipeline for performance insights.

Clarus Care is a healthcare technology company that helps medical practices manage patient communication through an AI-powered call management system. By automatically transcribing, prioritizing, and routing patient messages, Clarus improves response times, reduces staff workload, and minimizes hold times. Clarus is the fastest growing healthcare call management company, serving over 16,000 users across 40+ specialties. The company handles 15 million patient calls annually and maintains a 99% client retention rate.

Use case overview

Clarus is embarking on an innovative journey to transform their patient communication system from a traditional menu-driven Interactive Voice Response (IVR) to a more natural, conversational experience. The company aims to revolutionize how patients interact with healthcare providers by creating a generative AI-powered contact center capable of understanding and addressing multiple patient intents in a single interaction. Previously, patients navigated through rigid menu options to leave messages, which are then transcribed and processed. This approach, while functional, limits the system’s ability to handle complex patient needs efficiently. Recognizing the need for a more intuitive and flexible solution, Clarus collaborated with the GenAIIC to develop an AI-powered contact center that can comprehend natural language conversation, manage multiple intents, and provide a seamless experience across both voice and web chat interfaces. Key success criteria for the project were:

A natural language voice interface capable of understanding and processing multiple patient intents such as billing questions, scheduling, and prescription refills in a single call
<3 second latency for backend processing and response to the user
The ability to transcribe, record, and analyze call information
Smart transfer capabilities for urgent calls or when patients request to speak directly with providers
Support for both voice calls and web chat interfaces to accommodate various patient preferences
A scalable foundation to support Clarus’s growing customer base and expanding healthcare facility network
High availability with a 99.99% SLA requirement to facilitate reliable patient communication

Solution overview & architecture

The GenAIIC team collaborated with Clarus to create a generative AI-powered contact center using Amazon Connect and Amazon Lex, integrated with Amazon Nova and Anthropic’s Claude 3.5 Sonnet foundation models through Amazon Bedrock. Connect was selected as the core system due to its ability to maintain 99.99% availability while providing comprehensive contact center capabilities across voice and chat channels.

The model flexibility of Bedrock is central to the system, allowing task-specific model selection based on accuracy and latency. Claude 3.5 Sonnet was used for its high-quality natural language understanding capabilities, and Nova models offered optimization for low latency and comparable natural language understanding and generation capabilities. The following diagram illustrates the solution architecture for the main contact center solution:

The workflow consists of the following high-level steps:

A patient initiates contact through either a phone call or web chat interface.
Connect processes the initial contact and routes it through a configured contact flow.
Lex handles transcription and maintains conversation state.
An AWS Lambda fulfillment function processes the conversation using Claude 3.5 Sonnet and Nova models through Bedrock to:
1. Classify urgency and intents
2. Extract required information
3. Generate natural responses
4. Manage appointment scheduling when applicable

The models used for each specific function are described in solution detail sections.

Smart transfers to staff are initiated when urgent cases are detected or when patients request to speak with providers.
Conversation data is processed through an analytics pipeline for monitoring and reporting (described later in this post).

Some challenges the team tackled during the development process included:

Formatting the contact center call flow and service model in a way that is interchangeable for different customers, with minimal code and configuration changes
Managing latency requirements for a natural conversation experience
Transcription and understanding of patient names

In addition to voice calls, the team developed a web interface using Amazon CloudFront and Amazon S3 Static Website Hosting that demonstrates the system’s multichannel capabilities. This interface shows how patients can engage in AI-powered conversations through a chat widget, providing the same level of service and functionality as voice calls. While the web interface demo uses the same contact flow as the voice call, it can be further customized for chat-specific language.

The team also built an analytics pipeline that processes conversation logs to provide valuable insights into system performance and patient interactions. A customizable dashboard offers a user-friendly interface for visualizing this data, allowing both technical and non-technical staff to gain actionable insights from patient communications. The analytics pipeline and dashboard were built using a previously published reusable GenAI contact center asset.

Conversation handling details

The solution employs a sophisticated conversation management system that orchestrates natural patient interactions through the multi-model capabilities of Bedrock and carefully designed prompt layering. At the heart of this system is the ability of Bedrock to provide access to multiple foundation models, enabling the team to select the optimal model for each specific task based on accuracy, cost, and latency requirements. The flow of the conversation management system is shown in the following image; NLU stands for natural language understanding.

The conversation flow begins with a greeting and urgency assessment. When a patient calls, the system immediately evaluates whether the situation requires urgent attention using Bedrock APIs. This first step makes sure that emergency cases are quickly identified and routed appropriately. The system uses a focused prompt that analyzes the patient’s initial statement against a predefined list of urgent intent categories, returning either “urgent” or “non_urgent” to guide subsequent handling.

Following this, the system moves to intent detection. A key innovation here is the system’s ability to process multiple intents within a single interaction. Rather than forcing patients through rigid menu trees, the system can leverage powerful language models to understand when a patient mentions both a prescription refill and a billing question, queuing these intents for sequential processing while maintaining natural conversation flow. During this extraction, we make sure that the intent and the quote from the user input are both extracted. This produces two results:

Integrated model reasoning to make sure that the correct intent is extracted
Conversation history reference that led to intent extraction, so the same intent is not extracted twice unless explicitly asked for

Once the system starts processing intents sequentially, it starts prompting the user for data required to service the intent at hand. This happens in two interdependent stages:

Checking for missing information fields and generating a natural language prompt to ask the user for information
Parsing user utterances to analyze and extract collected fields and the fields that are still missing

These two steps happen in a loop until the required information is collected. The system also considers provider-specific services at this stage, where fields required per provider is collected. The solution automatically matches provider names mentioned by patients to the correct provider in the system. This handles variations like “Dr. Smith” matching to “Dr. Jennifer Smith” or “Jenny Smith,” removing the rigid name matching or extension requirements of traditional IVR systems. The solution also includes smart handoff capabilities. When the system needs to determine if a patient should speak with a specific provider, it analyses the conversation context to consider urgency and routing needs for the expressed intent. This process preserves the conversation context and collected information, facilitating a seamless experience when human intervention is requested. Throughout the conversation, the system maintains comprehensive state tracking through Lex session attributes while the natural language processing occurs through Bedrock model invocations. These attributes serve as the conversation’s memory, storing everything from the user’s collected information and conversation history to detected intents and collected information. This state management enables the system to maintain context across multiple Bedrock API calls, creating a more natural dialogue flow.

Intent management

The intent management system was designed through a hierarchical service model structure that reflects how patients naturally express their needs. To traverse this hierarchical service model, the user inputs are parsed using natural language understanding, which are handled through Bedrock API calls.

The hierarchical service model organizes intents into three primary levels:

Urgency Level: Separating urgent from non-urgent services facilitates appropriate handling and routing.
Service Level: Grouping related services like appointments, prescriptions, and billing creates logical categories.
Provider-Specific Level: Further granularity accommodates provider-specific requirements and sub-services

This structure enables the system to efficiently navigate through possible intents while maintaining flexibility for customization across different healthcare facilities. Each intent in the model includes custom instructions that can be dynamically injected into Bedrock prompts, allowing for highly configurable behavior without code changes. The intent extraction process leverages the advanced language understanding capabilities of Bedrock through a prompt that instructs the model to identify the intents present in a patient’s natural language input. The prompt includes comprehensive instructions about what constitutes a new intent, the complete list of possible intents, and formatting requirements for the response. Rather than forcing classification into a single intent, we intend to detect multiple needs expressed simultaneously. Once intents are identified, they are added to a processing queue. The system then works through each intent sequentially, making additional model calls in multiple layers to collect required information through natural conversation. To optimize for both quality and latency, the solution leverages the model selection flexibility of Bedrock for various conversation tasks in a similar fashion:

Intent extraction uses Anthropic’s Claude 3.5 Sonnet through Bedrock for detailed analysis that can identify multiple intents from natural language, making sure patients do not need to repeat information.
Information collection employs a faster model, Amazon Nova Pro, through Bedrock for structured data extraction while maintaining conversational tone.
Response generation utilizes a smaller model, Nova Lite, through Bedrock to create low-latency, natural, and empathetic responses based on the conversation state.

Doing this helps in making sure that the solution can:

Maintain conversational tone and empathy
Ask for only the specific missing information
Acknowledge information already provided
Handle special cases like spelling out names

The entire intent management pipeline benefits from the Bedrock unified Converse API, which provides:

Consistent interface across the model calls, simplifying development and maintenance
Model version control facilitating stable behavior across deployments
Future-proof architecture allowing seamless adoption of new models as they become available

By implementing this hierarchical intent management system, Clarus can offer patients a more natural and efficient communication experience while maintaining the structure needed for proper routing and information collection. The flexibility of combining the multi-model capabilities of Bedrock with a configurable service model allows for straightforward customization per healthcare facility while keeping the core conversation logic consistent and maintainable. As new models become available in Bedrock, the system can be updated to leverage improved capabilities without major architectural changes, facilitating long-term scalability and performance optimization.

Scheduling

The scheduling component of the solution is handled in a separate, purpose-built module. If an ‘appointment’ intent is detected in the main handler, processing is passed to the scheduling module. The module operates as a state machine consisting of conversation states and next steps. The overall flow of the scheduling system is shown below:

Scheduling System Flow

1. Initial State
   - Mention office hours
   - Ask for scheduling preferences
   - Move to GATHERING_PREFERENCES

2. GATHERING_PREFERENCES State
   - Extract and process time preferences using LLM
   - Check time preferences against existing scheduling database
   - Three possible outcomes:
     a. Specific time available
        - Present time for confirmation
        - Move to CONFIRMATION
     
     b. Range preference
        - Find earliest available time in range
        - Present this time for confirmation
        - Move to CONFIRMATION
     
     c. No availability (specific or range)
        - Find alternative times (±1 days from requested time)
        - Present available time blocks
        - Ask for preference
        - Stay in GATHERING_PREFERENCES
        - Increment attempt counter

3. CONFIRMATION State
   - Two possible outcomes:
     a. User confirms (Yes)
        - Book appointment
        - Send confirmation message
        - Move to END
     
     b. User declines (No)
        - Ask for new preferences
        - Move to GATHERING_PREFERENCES
        - Increment attempt counter

4. Additional Features
   - Maximum attempts tracking (default MAX_ATTEMPTS = 3)
   - When max attempts reached:
     - Apologize and escalate to office staff
     - Move to END

5. END State
   - Conversation completed
   - Either with successful booking or escalation to staff

There are three main LLM prompts used in the scheduling flow:

Extract time preferences (Nova Lite is used for low latency and use preference understanding)

Extract current scheduling preferences from the conversation. The response must be in this format:<reasoning>

Explain:

- What type of preferences were expressed (specific or range)
- How you interpreted any relative dates or times
- Why you structured and prioritized the preferences as you did
- Any assumptions you made

</reasoning>
<json>
[
  {{
    "type": "specific",
    "priority": n,
    "specificSlots": [
      {{
        "date": "YYYY-MM-DD",
        "startTime": "HH:mm",
        "endTime": "HH:mm" 
      }}
    ]
  }},

  <!-- Repeat for each distinct preference -->

  {{
    "type": "range",
    "priority": n,
    "dateRange": {{
      "startDate": "YYYY-MM-DD",
      "endDate": "YYYY-MM-DD",
      "daysOfWeek": [], // "m", "t", "w", "th", "f"
      "timeRanges": [
        {{
          "startTime": "HH:mm",
          "endTime": "HH:mm"
        }}
      ]
    }}
  }}
  <!-- Repeat for each distinct preference -->
]

</json>

Guidelines:
- If time preferences have changed throughout the conversation, only extract current preferences
- You may have multiple of the same type of preference if needed
- Ensure proper JSON formatting, the JSON portion of the output should work correctly with json.loads(). Do not include comments in JSON.
- Convert relative dates (tomorrow, next Tuesday) to specific dates
- Keywords:
    * morning: 09:00-12:00
    * afternoon: 12:00-17:00
- Convert time descriptions to specific ranges (e.g. "morning before 11": 09:00-11:00, "2-4 pm": 14:00-16:00)
- Appointments are only available on weekdays from 9:00-17:00
- If no end time is specified for a slot, assume a 30-minute duration

Example:
(Example section removed for brevity)

Now, extract the scheduling preferences from the given conversation.

Current time: {current_time}
Today is {current_day}
Conversation:
<conversation>
{conversation_history}
</conversation>

Determine if user is confirming or denying time (Nova Micro is used for low latency on a simple task)

Determine if the user is confirming or declining the suggested appointment time. Return "true" if they are clearly confirming, "false" otherwise.
<confirm>true|false</confirm>
User message: {user_message}

Generate a natural response based on a next step (Nova Lite is used for low latency and response generation)

Given the conversation history and the next step, generate a natural and contextually appropriate response to the user.

Output your response in <response> tags:
<response>Your response here</response>

Conversation history:
{conversation_history}

Next step:
{next_step_prompt}

The possible steps are:

Initial greeting

Ask the user when they would like to schedule their appointment with {provider}. Do not say Hi or Hello, this is mid-conversation.

Mention that our office hours are {office_hours}.

Confirm time

The time {time} is available with {provider}. 

Ask the user to confirm yes or no if this time works for them before proceeding with the booking.
Do not say the appointment is already confirmed.

Show alternative times

Inform the user that their requested time {requested_time} is not available.
Offer these alternative time or time ranges with {provider}: {blocks}
Ask which time would work best for them.

Ask for new preferences

Acknowledge that the suggested time doesn't work for them.
Ask what other day or time they would prefer for their appointment with {provider}.
Remind them that our office hours are {office_hours}.

Let the user know you will escalate to the office

Apologize that you haven't been able to find a suitable time.
Inform the user that you'll have our office staff reach out to help find an appointment time that works for them.

Thank them for their patience.

End a conversation with booking confirmation

VERY BRIEFLY confirm that their appointment is confirmed with {provider} for {time}.

Do not say anything else.

Example: Appointment confirmed for June 5th with Dr. Wolf

System Extensions

In the future, Clarus can integrate the contact center’s voicebot with Amazon Nova Sonic. Nova Sonic is a speech-to-speech LLM that delivers real-time, human-like voice conversations with leading price performance and low latency. Nova Sonic is now directly integrated with Connect.

Bedrock has several additional services which help with scaling the solution and deploying it to production, including:

Native support for Retrieval Augmented Generation (RAG) and structured data retrieval through Bedrock Knowledge Bases
Bedrock Guardrails for implementing content and PII/PHI safeguards
Bedrock Evaluations for automated and human-based conversation evaluation

Conclusion

In this post, we demonstrated how the GenAIIC team collaborated with Clarus Care to develop a generative AI-powered healthcare contact center using Amazon Connect, Amazon Lex, and Amazon Bedrock. The solution showcases a conversational voice interface capable of handling multiple patient intents, managing appointment scheduling, and providing smart transfer capabilities. By leveraging Amazon Nova and Anthropic’s Claude 3.5 Sonnet language models and AWS services, the system achieves high availability while offering a more intuitive and efficient patient communication experience.The solution also incorporates an analytics pipeline for monitoring call quality and metrics, as well as a web interface demonstrating multichannel support. The solution’s architecture provides a scalable foundation that can adapt to Clarus Care’s growing customer base and future service offerings.The transition from a traditional menu-driven IVR to an AI-powered conversational interface enables Clarus to help enhance patient experience, increase automation capabilities, and streamline healthcare communications. As they move towards implementation, this solution will empower Clarus Care to meet the evolving needs of both patients and healthcare providers in an increasingly digital healthcare landscape.

If you want to implement a similar solution for your use case, consider the blog Deploy generative AI agents in your contact center for voice and chat using Amazon Connect, Amazon Lex, and Amazon Bedrock Knowledge Bases for the infrastructure setup.

About the authors

Rishi Srivastava is the VP of Engineering at Clarus Care. He is a seasoned industry leader with over 20 years in enterprise software engineering, specializing in design of multi-tenant Cloud based SaaS architecture and, conversational AI agentic solutions related to patient engagement. Previously, he worked in financial services and quantitative finance, building latent factor models for sophisticated portfolio analytics to drive data-informed investment strategies.

Scott Reynolds is the VP of Product at Clarus Care, a healthcare SaaS communications and AI-powered patient engagement platform. He’s spent over 25 years in the technology and software market creating secure, interoperable platforms that streamline clinical and operational workflows. He has founded multiple startups and holds a U.S. patent for patient-centric communication technology.

Brian Halperin joined AWS in 2024 as a GenAI Strategist in the Generative AI Innovation Center, where he helps enterprise customers unlock transformative business value through artificial intelligence. With over 9 years of experience spanning enterprise AI implementation and digital technology transformation, he brings a proven track record of translating complex AI capabilities into measurable business outcomes. Brian previously served as Vice President on an operating team at a global alternative investment firm, leading AI initiatives across portfolio companies.

Brian Yost is a Principal Deep Learning Architect in the AWS Generative AI Innovation Center. He specializes in applying agentic AI capabilities in customer support scenarios, including contact center solutions.

Parth Patwa is a Data Scientist in the Generative AI Innovation Center at Amazon Web Services. He has co-authored research papers at top AI/ML venues and has 1500+ citations.

Smita Bailur is a Senior Applied Scientist at the AWS Generative AI Innovation Center, where she brings over 10 years of expertise in traditional AI/ML, deep learning, and generative AI to help customers unlock transformative solutions. She holds a masters degree in Electrical Engineering from the University of Pennsylvania.

Shreya Mohanty Shreya Mohanty is a Strategist in the AWS Generative AI Innovation Center where she specializes in model customization and optimization. Previously she was a Deep Learning Architect, focused on building GenAI solutions for customers. She uses her cross-functional background to translate customer goals into tangible outcomes and measurable impact.

Yingwei Yu Yingwei Yu is an Applied Science Manager at the Generative AI Innovation Center (GenAIIC) at Amazon Web Services (AWS), based in Houston, Texas. With experience in applied machine learning and generative AI, Yu leads the development of innovative solutions across various industries. He has multiple patents and peer-reviewed publications in professional conferences. Yingwei earned his Ph.D. in Computer Science from Texas A&M University – College Station.