ML 19682 image 1
This post is cowritten with David Kim, and Premjit Singh from Ring.
Scaling self-service support globally presents challenges beyond translation. In this post, we show you how Ring, Amazon’s home security subsidiary, built a production-ready, multi-locale Retrieval-Augmented Generation (RAG)-based support chatbot using Amazon Bedrock Knowledge Bases. By eliminating per-Region infrastructure deployments, Ring reduced the cost of scaling to each additional locale by 21%. At the same time, Ring maintained consistent customer experiences across 10 international Regions.
In this post, you’ll learn how Ring implemented metadata-driven filtering for Region-specific content, separated content management into ingestion, evaluation and promotion workflows, and achieved cost savings while scaling up. The architecture described in this post uses Amazon Bedrock Knowledge Bases, Amazon Bedrock, AWS Lambda, AWS Step Functions, and Amazon Simple Storage Service (Amazon S3). Whether you’re expanding support operations internationally or looking to optimize your existing RAG architecture, this implementation provides practical patterns you can apply to your own multi-locale support systems.
Customer support at Ring initially relied on a rule-based chatbot built with Amazon Lex. While functional, the system had limitations with predefined conversation patterns that couldn’t handle the diverse range of customer inquiries. During peak periods, 16% of interactions escalated to human agents, and support engineers spent 10% of their time maintaining the rule-based system. As Ring expanded across international locales, this approach became unsustainable.
Ring faced a challenge: how to provide accurate, contextually relevant support across multiple international locales without creating separate infrastructure for each Region. The team identified four requirements that would inform their architectural approach.
The international presence of Ring required more than translation. Each territory needed Region-specific product information, from voltage specifications to regulatory compliance details, provided through a unified system. Across the UK, Germany, and eight other locales, Ring needed to handle distinct product configurations and support scenarios for each Region.
Ring wanted their engineering team focused on improving customer experience, not managing infrastructure. The team needed a fully managed, serverless solution.
With hundreds of product guides, troubleshooting documents, and support articles constantly being updated, Ring needed vector search technology that could retrieve precise information from a unified repository. The system had to support automated content ingestion pipelines so that the Ring content team could publish updates that would become available across multiple locales without manual intervention.
The average end-to-end latency requirement for Ring was 7–8 seconds and performance analysis revealed that cross-Region latency accounted for less than 10% of total response time. This finding allowed Ring to adopt a centralized architecture rather than deploying separate infrastructure in each Region, which reduced operational complexity and costs.
To address these requirements, Ring implemented metadata-driven filtering with content locale tags. This approach serves Region-specific content from a single centralized system. For their serverless requirements, Ring chose Amazon Bedrock Knowledge Bases and Lambda, which removed the need for infrastructure management while providing automatic scaling.
Ring designed their RAG-based chatbot architecture to separate content management into two core processes: Ingestion & Evaluation and Promotion. This two-phase approach allows Ring to maintain continuous content improvement while keeping production systems stable.
Figure 1: Architecture diagram showing the Ring ingestion and evaluation workflow with Step Functions orchestrating daily knowledge base creation, evaluation, and quality validation using Knowledge Bases and S3 storage.
contentLocale classification (for example, {locale}/Service.Ring.{Upsert/Delete}.{unique_identifier}.json)For the doorbell battery example, the Ring metadata and content files have the following structure:
{locale}/Service.Ring.{Upsert/Delete}.{unique_identifier}.metadata.json
{locale}/Service.Ring.{Upsert/Delete}.{unique_identifier}.json
Ring uses AWS Step Functions to orchestrate their daily workflow that:
Each version maintains a separate Knowledge Base, giving Ring independent evaluation capabilities and straightforward rollback options.
The AWS Step Functions workflow continues using evaluation datasets to:
contentLocaleRing uses the Anthropic Claude Sonnet 4 large language model (LLM)-as-a-judge to:
contentLocaleThis architecture supports rollbacks to previous versions for up to 30 days. Because content is updated approximately 200 times per week, Ring decided not to maintain versions beyond 30 days.
Figure 2: Architecture diagram showing the Ring production chatbot system where customer queries flow through AWS Lambda to retrieve context from Knowledge Bases and generate responses using foundation models
Ring configured Lambda to process customer queries and retrieve relevant content from Amazon Bedrock Knowledge Bases. The function:
contentLocale tags using equals operator for precise Regional content targetingHere’s the sample code Ring uses in AWS Lambda:
In the Lambda function, the system:
contentLocaleWhen building your own RAG-based system at scale, consider these architectural approaches and operational requirements beyond the core implementation:
The Ring implementation uses Amazon OpenSearch Serverless as the vector store for their knowledge bases. However, Amazon Bedrock Knowledge Bases also supports Amazon S3 Vectors as a vector store option. When choosing between these options, consider:
In addition to these two options, AWS supports integrations with other data store options, including Amazon Kendra, Amazon Neptune Analytics, and Amazon Aurora PostgreSQL. Evaluate your specific requirements around query complexity, cost optimization, and operational needs when selecting your vector store. The prescriptive guidance provides a good starting point to evaluate vector stores for your RAG use case.
While Ring implemented separate Knowledge Bases for each version, you might consider an alternative approach involving separate data sources for each version within a single knowledge base. This method leverages the x-amz-bedrock-kb-data-source-id filter parameter to target specific data sources during retrieval:
When choosing between these approaches, weigh these specific trade-offs:
Consider your disaster recovery requirements when designing your RAG architecture. Amazon Bedrock Knowledge Bases are Regional resources. To achieve robust disaster recovery, deploy your complete architecture across multiple Regions:
The centralized architecture serves its traffic from a single Region, prioritizing cost optimization over multi-region deployment. Evaluate your own Recovery Time Objective (RTO) and Recovery Point Objective (RPO) requirements to determine whether a multi-Region deployment is necessary for your use case.
Amazon Bedrock foundation models are Regional resources with Regional quotas. To handle traffic bursts and scale beyond single-Region quotas, Amazon Bedrock supports cross-Region inference (CRIS). CRIS automatically routes inference requests across multiple AWS Regions to increase throughput:
CRIS: Routes requests only within specific geographic boundaries (such as within the US or within the EU) to meet data residency requirements. This can provide up to double the default in-Region quotas.
Global CRIS: Routes requests across multiple commercial Regions worldwide, optimizing available resources and providing higher model throughput beyond geographic CRIS capabilities. Global CRIS automatically selects the optimal Region to process each request.
CRIS operates independently from your Knowledge Base deployment strategy. Even with a single-Region Knowledge Base deployment, you can configure CRIS to scale your foundation model throughput during traffic bursts. Note that CRIS applies only to the inference layer—your Knowledge Bases, S3 buckets, and orchestration logic remain Regional resources that require separate multi-Region deployment for disaster recovery.
Selecting the appropriate embedding model and chunking strategy is important for RAG system performance because it directly affects retrieval accuracy and response quality. Ring uses the Amazon Titan Embeddings model with the default chunking strategy, which proved effective for their support documentation.
Amazon Bedrock offers flexibility with multiple options:
Embedding models:
Chunking strategies:
When ingesting data, Amazon Bedrock splits documents into manageable chunks for efficient retrieval using four strategies:
Evaluate your content characteristics to select the optimal combination for your specific use case.
In this post, we showed how Ring built a production-ready, multi-locale RAG-based support chatbot using Amazon Bedrock Knowledge Bases. The architecture combines automated content ingestion, systematic daily evaluation using an LLM-as-judge approach, and metadata-driven content targeting to achieve a 21% reduction in infrastructure and operational cost per additional locale, while maintaining consistent customer experiences across 10 international Regions.
Beyond the core RAG architecture, we covered key design considerations for production deployments: vector store selection, versioning strategies, multi-Region deployment for disaster recovery, Cross-Region Inference for scaling foundation model throughput, embedding model selection and chunking strategies. These patterns apply broadly to any team building multi-locale or high-availability RAG systems on AWS.Ring continues to evolve their chatbot architecture toward an agentic model with dynamic agent selection and integration of multiple specialized agents. This agentic approach will allow Ring to route customer inquiries to specialized agents for device troubleshooting, order management, and product recommendations, demonstrating the extensibility of RAG-based support systems built on Amazon Bedrock.
To learn more about Amazon Bedrock Knowledge Bases, visit the Amazon Bedrock documentation.
Your monthly "Anzhc's Posts" issue have arrived. Today im introducing - Mugen - continuation of…
Your monthly "Anzhc's Posts" issue have arrived. Today im introducing - Mugen - continuation of…
This article is divided into three parts; they are: • How Attention Works During Prefill…
This article is divided into three parts; they are: • How Attention Works During Prefill…
Feature engineering is where most of the real work in machine learning happens.
Feature engineering is where most of the real work in machine learning happens.