local rag v2
With the general availability of Amazon Bedrock Agents, you can rapidly develop generative AI applications to run multi-step tasks across a myriad of enterprise systems and data sources. However, some geographies and regulated industries bound by data protection and privacy regulations have sought to combine generative AI services in the cloud with regulated data on premises. In this post, we show how to extend Amazon Bedrock Agents to hybrid and edge services such as AWS Outposts and AWS Local Zones to build distributed Retrieval Augmented Generation (RAG) applications with on-premises data for improved model outcomes. With Outposts, we also cover a reference pattern for a fully local RAG application that requires both the foundation model (FM) and data sources to reside on premises.
For organizations processing or storing sensitive information such as personally identifiable information (PII), customers have asked for AWS Global Infrastructure to address these specific localities, including mechanisms to make sure that data is being stored and processed in compliance with local laws and regulations. Through AWS hybrid and edge services such as Local Zones and Outposts, you can benefit from the scalability and flexibility of the AWS Cloud with the low latency and local processing capabilities of an on-premises (or localized) infrastructure. This hybrid approach allows organizations to run applications and process data closer to the source, reducing latency, improving responsiveness for time-sensitive workloads, and adhering to data regulations.
Although architecting for data residency with an Outposts rack and Local Zone has been broadly discussed, generative AI and FMs introduce an additional set of architectural considerations. As generative AI models become increasingly powerful and ubiquitous, customers have asked us how they might consider deploying models closer to the devices, sensors, and end users generating and consuming data. Moreover, interest in small language models (SLMs) that enable resource-constrained devices to perform complex functions—such as natural language processing and predictive automation—is growing. To learn more about opportunities for customers to use SLMs, see Opportunities for telecoms with small language models: Insights from AWS and Meta on our AWS Industries blog.
Beyond SLMs, the interest in generative AI at the edge has been driven by two primary factors:
In this post, we cover two primary architectural patterns: fully local RAG and hybrid RAG.
For the deployment of a large language model (LLM) in a RAG use case on an Outposts rack, the LLM will be self-hosted on a G4dn instance and knowledge bases will be created on the Outpost rack, using either Amazon Elastic Block Storage (Amazon EBS) or Amazon S3 on Outposts. The documents uploaded to the knowledge base on the rack might be private and sensitive documents, so they won’t be transferred to the AWS Region and will remain completely local on the Outpost rack. You can use a local vector database either hosted on Amazon Elastic Compute Cloud (Amazon EC2) or using Amazon Relational Database Service (Amazon RDS) for PostgreSQL on the Outpost rack with the pgvector extension to store embeddings. See the following figure for an example.
Certain customers are required by data protection or privacy regulations to keep their data within specific state boundaries. To align with these requirements and still use such data for generative AI, customers with hybrid and edge environments need to host their FMs in both a Region and at the edge. This setup enables you to use data for generative purposes and remain compliant with security regulations. To orchestrate the behavior of such a distributed system, you need a system that can understand the nuances of your prompt and direct you to the right FM running in a compliant environment. Amazon Bedrock Agents makes this distributed system in hybrid systems possible.
Amazon Bedrock Agents enables you to build and configure autonomous agents in your application. Agents orchestrate interactions between FMs, data sources, software applications, and user conversations. The orchestration includes the ability to invoke AWS Lambda functions to invoke other FMs, opening the ability to run self-managed FMs at the edge. With this mechanism, you can build distributed RAG applications for highly regulated industries subject to data residency requirements. In the hybrid deployment scenario, in response to a customer prompt, Amazon Bedrock can perform some actions in a specified Region and defer other actions to a self-hosted FM in a Local Zone. The following example illustrates the hybrid RAG high-level architecture.
In the following sections, we dive deep into both solutions and their implementation.
To start, you need to configure your virtual private cloud (VPC) with an edge subnet on the Outpost rack. To create an edge subnet on the Outpost, you need to find the Outpost Amazon Resource Name (ARN) on which you want to create the subnet, as well as the Availability Zone of the Outpost. After you create the internet gateway, route tables, and subnet associations, launch a series of EC2 instances on the Outpost rack to run your RAG application, including the following components.
Optionally, your underlying proprietary data sources can be stored on Amazon Simple Storage Service (Amazon S3) on Outposts or using Amazon S3-compatible solutions running on Amazon EC2 instances with EBS volumes.
The components intercommunicate through the traffic flow illustrated in the following figure.
The workflow consists of the following steps:
To learn more about the fully local RAG application or get hands-on with the sample application, see Module 2 of our public AWS Workshop: Hands-on with Generative AI on AWS Hybrid & Edge Services.
To start, you need to configure a VPC with an edge subnet, either corresponding to an Outpost rack or Local Zone depending on the use case. After you create the internet gateway, route tables, and subnet associations, launch an EC2 instance on the Outpost rack (or Local Zone) to run your hybrid RAG application. On the EC2 instance itself, you can reuse the same components as the fully local RAG: a vector store, backend API server, embedding model and a local LLM.
In this architecture, we rely heavily on managed services such as Lambda and Amazon Bedrock because only select FMs and knowledge bases corresponding to the heavily regulated data, rather than the orchestrator itself, are required to live at the edge. To do so, we will extend the existing Amazon Bedrock Agents workflows to the edge using a sample FM-powered customer service bot.
In this example customer service bot, we’re a shoe retailer bot that provides customer service support for purchasing shoes by providing options in a human-like conversation. We also assume that the knowledge base surrounding the practice of shoemaking is proprietary and, therefore, resides at the edge. As a result, questions surrounding shoemaking will be addressed by the knowledge base and local FM running at the edge.
To make sure that the user prompt is effectively proxied to the right FM, we rely on Amazon Bedrock Agents action groups. An action group defines actions that the agent can perform, such as place_order
or check_inventory
. In our example, we could define an additional action within an existing action group called hybrid_rag
or learn_shoemaking
that specifically addresses prompts that can only be addressed by the AWS hybrid and edge locations.
As part of the agent’s InvokeAgent
API, an agent interprets the prompt (such as “How is leather used for shoemaking?”) with an FM and generates a logic for the next step it should take, including a prediction for the most prudent action in an action group. In this example, we want the prompt, “Hello, I would like recommendations to purchase some shoes.” to be directed to the /check_inventory
action group, whereas the prompt, “How is leather used for shoemaking?” could be directed to the /hybrid_rag
action group.
The following diagram illustrates this orchestration, which is implemented by the orchestration phase of the Amazon Bedrock agent.
To create the additional edge-specific action group, the new OpenAPI schema must reflect the new action, hybrid_rag
with a detailed description, structure, and parameters that define the action in the action group as an API operation specifically focused on a data domain only available in a specific edge location.
After you define an action group using the OpenAPI specification, you can define a Lambda function to program the business logic for an action group. This Lambda handler (see the following code) might include supporting functions (such as queryEdgeModel
) for the individual business logic corresponding to each action group.
However, in the action group corresponding to the edge LLM (as seen in the code below), the business logic won’t include Region-based FM invocations, such as using Amazon Bedrock APIs. Instead, the customer-managed endpoint will be invoked, for example using the private IP address of the EC2 instance hosting the edge FM in a Local Zone or Outpost. This way, AWS native services such as Lambda and Amazon Bedrock can orchestrate complicated hybrid and edge RAG workflows.
After the solution is fully deployed, you can visit the chat playground feature on the Amazon Bedrock Agents console and ask the question, “How are the rubber heels of shoes made?” Even though most of the prompts will be be exclusively focused on retail customer service operations for ordering shoes, the native orchestration support by Amazon Bedrock Agents seamlessly directs the prompt to your edge FM running the LLM for shoemaking.
To learn more about this hybrid RAG application or get hands-on with the cross-environment application, refer to Module 1 of our public AWS Workshop: Hands-on with Generative AI on AWS Hybrid & Edge Services.
In this post, we demonstrated how to extend Amazon Bedrock Agents to AWS hybrid and edge services, such as Local Zones or Outposts, to build distributed RAG applications in highly regulated industries subject to data residency requirements. Moreover, for 100% local deployments to align with the most stringent data residency requirements, we presented architectures converging the knowledge base, compute, and LLM within the Outposts hardware itself.
To get started with both architectures, visit AWS Workshops. To get started with our newly released workshop, see Hands-on with Generative AI on AWS Hybrid & Edge Services. Additionally, check out other AWS hybrid cloud solutions or reach out to your local AWS account team to learn how to get started with Local Zones or Outposts.
The End of the AI Safety DebateFor years, a passionate contingent of researchers, ethicists, and…
A new wave of AI-powered browser-use agents is emerging, promising to transform how enterprises interact…
Employees throughout the federal government have until 11:59pm ET Monday to detail five things they…
Researchers are blurring the lines between robotics and materials, with a proof-of-concept material-like collective of…
Be sure to check out the previous articles in this series: •
TL;DR We compared Grok 3 and o3-mini’s results on this topic. They both passed. Since…