ML 18695 overview
Amazon Bedrock Knowledge Bases has extended its vector store options by enabling support for Amazon OpenSearch Service managed clusters, further strengthening its capabilities as a fully managed Retrieval Augmented Generation (RAG) solution. This enhancement builds on the core functionality of Amazon Bedrock Knowledge Bases , which is designed to seamlessly connect foundation models (FMs) with internal data sources. Amazon Bedrock Knowledge Bases automates critical processes such as data ingestion, chunking, embedding generation, and vector storage, and the application of advanced indexing algorithms and retrieval techniques, empowering users to develop intelligent applications with minimal effort.
The latest update broadens the vector database options available to users. In addition to the previously supported vector stores such as Amazon OpenSearch Serverless, Amazon Aurora PostgreSQL-Compatible Edition, Amazon Neptune Analytics, Pinecone, MongoDB, and Redis Enterprise Cloud, users can now use OpenSearch Service managed clusters. This integration enables the use of an OpenSearch Service domain as a robust backend for storing and retrieving vector embeddings, offering greater flexibility and choice in vector storage solutions.
To help users take full advantage of this new integration, this post provides a comprehensive, step-by-step guide on integrating an Amazon Bedrock knowledge base with an OpenSearch Service managed cluster as its vector store.
OpenSearch Service provides two complementary deployment options for vector workloads: managed clusters and serverless collections. Both harness the powerful vector search and retrieval capabilities of OpenSearch Service, though each excels in different scenarios. Managed clusters offer extensive configuration flexibility, performance tuning options, and scalability that make them particularly well-suited for enterprise-grade AI applications.Organizations seeking greater control over cluster configurations, compute instances, the ability to fine-tune performance and cost, and support for a wider range of OpenSearch features and API operations will find managed clusters a natural fit for their use cases. Alternatively, OpenSearch Serverless excels in use cases that require automatic scaling and capacity management, simplified operations without the need to manage clusters or nodes, automatic software updates, and built-in high availability and redundancy. The optimal choice depends entirely on specific use case, operational model, and technical requirements. Here are some key reasons why OpenSearch Service managed clusters offer a compelling choice for organizations:
Before we dive into the setup, make sure you have the following prerequisites in place:
This section covers the following high-level steps to integrate an OpenSearch Service managed cluster with Amazon Bedrock Knowledge Bases:
The following diagram illustrates these steps:
Here are the steps to follow in the AWS console to integrate Amazon Bedrock Knowledge Bases with OpenSearch Service Managed Cluster.
Before creating an OpenSearch Service domain, you need to create two key IAM resources: a dedicated IAM admin user and a master role. This approach facilitates proper access management for your OpenSearch Service domain, particularly when implementing fine-grained access control, which is strongly recommended for production environments. This user and role will have the necessary permissions to create, configure, and manage the OpenSearch Service domain and its integration with Amazon Bedrock Knowledge Bases.
The administrative user serves as the principal account for managing the OpenSearch Service configuration. To create an IAM admin user, follow these steps:
<opensearch-admin>
AmazonOpenSearchServiceFullAccess
managed policy, which grants comprehensive permissions for OpenSearch Service operationsAfter creating the user, copy and save the user’s Amazon Resource name (ARN) for later use in domain configuration, replacing <ACCOUNT_ID>
with your AWS account ID.
The ARN will look like this:
arn:aws:iam::<ACCOUNT_ID>:user/opensearch-admin
With OpenSearch Service, you can assign a master user for domains with fine-grained access control. By configuring an IAM role as the master user, you can manage access using trusted principles and avoid static usernames and passwords. To create the IAM role, follow these steps:
opensearch-admin
user, to assume this roleAmazonOpenSearchServiceFullAccess
managed policy you used for your admin userOpenSearchMasterRole
and choose Create roleAfter the role is created, navigate to its summary page and copy the role’s ARN. You’ll need this ARN when configuring your OpenSearch Service domain’s master user.
arn:aws:iam:: <ACCOUNT_ID>:role/OpenSearchMasterRole
With the administrative IAM role established, the next step is to create the OpenSearch Service domain that will serve as the vector store for your Amazon Bedrock knowledge base. This involves configuring the domain’s engine, network access, and, most importantly, its security settings using fine-grained access control.
bedrock-kb-domain
.If your workload demands higher input/output operations per second (IOPS) or throughput or involves managing substantial volumes of data, selecting Standard create is recommended. With this option enabled, you can customize instance types, storage configurations, and advanced security settings to optimize the speed and efficiency of data storage and retrieval operations, making it well-suited for production environments. For example, you can scale the baseline GP3 volume performance from 3,000 IOPS and 125 MiB/s throughput up to 16,000 IOPS and 1,000 MiB/s throughput for every 3 TiB of storage provisioned per data node. This flexibility means that you can align your OpenSearch Service domain performance with specific workload demands, facilitating efficient indexing and retrieval operations for high-throughput or large-scale applications. These settings should be fine-tuned based on the size and complexity of your OpenSearch Service workload to optimize both performance and cost.
However, although increasing your domain’s throughput and storage settings can help improve domain performance—and might help mitigate ingestion errors caused by storage or node-level bottlenecks—it doesn’t increase the ingestion speed into Amazon Bedrock Knowledge Bases as of this writing. Knowledge base ingestion operates at a fixed throughput rate for customers and vector databases, regardless of underlying domain configuration. AWS continues to invest in scaling and evolving the ingestion capabilities of Bedrock Knowledge Bases, and future improvements might offer greater flexibility.
In the fine-grained access control implementation section, we guide you through creating a custom OpenSearch Service role with specific index and cluster permissions, then authorizing Amazon Bedrock Knowledge Bases by associating its service role with this custom role. This mapping establishes a trust relationship that restricts Bedrock Knowledge Bases to only the operations you’ve explicitly permitted when accessing your OpenSearch Service domain with its service credentials, facilitating secure and controlled integration.
When enabling fine-grained access control, you must select a master user to manage the domain. You have two options:
For this walkthrough, we choose Set IAM ARN as master user. This is the recommended approach for production environments because it integrates with your existing AWS identity framework, providing better auditability and security management.
In the text box, paste the ARN of the OpenSearchMasterRole
that you created in the first step, as shown in the following screenshot. This designates the IAM role as the superuser for your OpenSearch Service domain, granting it full permissions to manage users, roles, and permissions within OpenSearch Dashboards.
Although setting an IAM master user is ideal for programmatic access, it’s not convenient for allowing users to log in to the OpenSearch Dashboards. In a subsequent step, after the domain is created and we’ve configured Cognito resources, we’ll revisit this security configuration to enable Amazon Cognito authentication. Then you’ll be able to create a user-friendly login experience for the OpenSearch Dashboards, where users can sign in through a hosted UI and be automatically mapped to IAM roles (such as the MasterUserRole
or more limited roles), combining ease of use with robust, role-based security. For now, proceed with the IAM ARN as the master user to complete the initial domain setup.
After your domain becomes active, navigate to its detail page to retrieve the following information:
https://search
-<domain-name>-<unique-identifier>.<region>.es.amazonaws.com
arn:aws:es:<region>:<account-id>:domain/<domain-name>
Make sure to copy and securely store both these details because you’ll need them when configuring your Amazon Bedrock knowledge base in subsequent steps. With the OpenSearch Service domain up and running, you now have an empty cluster ready to store your vector embeddings. Next, we move on to configuring a vector index within this domain.
Following the creation of your OpenSearch Service domain, the next step is to configure an Amazon Cognito user pool. This user pool will provide a secure and user-friendly authentication layer for accessing the OpenSearch Dashboards. Follow these steps:
opensearch-kb-app
. This name will automatically become your app client name.https://search-<domain-name>-<unique-identifier>.aos.<region>.on.aws/_dashboards
.The simplified interface automatically configures optimal settings for your selected application type, including appropriate security policies, OAuth flows, and hosted UI domain generation. Copy and save the User pool ID and App client ID values. You’ll need them to configure the Cognito identity pool and update the OpenSearch Service domain’s security settings.
After creating your Amazon Cognito user pool, you need to add an administrator user who will have access to OpenSearch Dashboards. Follow these steps:
admin@example.com
Upon the administrator’s first login, they’ll be prompted to create a permanent password. When all the subsequent setup steps are complete, this admin user will be able to authenticate to OpenSearch Dashboards.
With your Amazon Cognito user pool created, the next step is to configure app client parameters that will enable seamless integration with your OpenSearch dashboard. The app client configuration defines how OpenSearch Dashboards will interact with the Cognito authentication system, including callback URLs, OAuth flows, and scope permissions. Follow these steps:
Save the configuration by choosing Save changes at the bottom of the page to apply the OAuth settings to your app client. The system will validate your configuration and confirm the updates have been successfully applied.
Before creating the Cognito identity pool, you must first update your existing OpenSearchMasterRole
to trust the Cognito identity service. This is required because only IAM roles with the proper trust policy for cognito-identity.amazonaws.com will appear in the Identity pool role selection dropdown list. Follow these steps:
YOUR_ACCOUNT_ID
with your AWS account number. Leave PLACEHOLDER_IDENTITY_POOL_ID
as is for now. You’ll update this in Step 6 after creating the identity pool:The identity pool serves as a bridge between your Cognito user pool authentication and AWS IAM roles so that authenticated users can assume specific IAM permissions when accessing your OpenSearch Service domain. This configuration is essential for mapping Cognito authenticated users to the appropriate OpenSearch Service access permissions. This step primarily configures administrative access to the OpenSearch Dashboards, allowing domain administrators to manage users, roles, and domain settings through a secure web interface. Follow these steps:
OpenSearchMasterRole
that you created in Establish administrative access with IAM master user and role. This assignment grants authenticated users the comprehensive permissions defined in your master role so that they can: This configuration provides full administrative access to your OpenSearch Service domain. Users who authenticate through this Cognito setup will have master-level permissions, making this suitable for domain administrators who need to configure security settings, manage users, and perform maintenance tasks.
OpenSearchMasterRole
to authenticated users from this user pool.OpenSearchIdentityPool
.To update your master role’s trust policy with the identity pool ID, follow these steps:
PLACEHOLDER_IDENTITY_POOL_ID
with your identity pool ID from the previous stepYour authentication infrastructure is now configured to provide secure, administrative access to OpenSearch Dashboards through Amazon Cognito authentication. Users who authenticate through the Cognito user pool will assume the master role and gain full administrative capabilities for your OpenSearch Service domain.
After setting up your Cognito user pool, app client, and identity pool, the next step is to configure your OpenSearch Service domain to use Cognito authentication for OpenSearch Dashboards. Follow these steps:
The domain will update its configuration, which might take several minutes. You’ll receive a progress pop-up, as shown in the following screenshot.
This step involves creating a vector search–enabled index in your OpenSearch Service domain for Amazon Bedrock to store document embedding vectors, text chunks, and metadata. The index must contain three essential fields: an embedding vector field that stores numerical representations of your content (in floating-point or binary format), a text field that holds the raw text chunks, and a field for Amazon Bedrock managed metadata where Amazon Bedrock tracks critical information such as document IDs and source attributions. With proper index mapping, Amazon Bedrock Knowledge Bases can efficiently store and retrieve the components of your document data.
You create this index using the Dev Tools feature in OpenSearch Dashboards. To access Dev Tools in OpenSearch Dashboards, follow these steps:
admin@example.com
)To define and create the index copy the following command into the Dev Tools console and replace bedrock-kb-index
with your preferred index name if needed. If you’re setting up a binary vector index (for example, to use binary embeddings with Amazon Titan Text Embeddings V2), include the additional required fields in your index mapping:
data_type
“: “binary
” for the vector fieldspace_type
“: “hamming
” (instead of “l2”, which is used for float embeddings)For more details, refer to the Amazon Bedrock Knowledge Bases setup documentation.
The key components of this index mapping are:
knn_vector
field type.embeddings
field for storing vector data, specifying dimension, space type, and data type based on the chosen embedding model. It’s critical to match the dimension with the embedding model’s output. Amazon Bedrock Knowledge Bases offers models such as Amazon Titan Embeddings V2 (with 256, 512, or 1,024 dimensions) and Cohere Embed (1,024 dimensions). For example, using Amazon Titan Embeddings V2 with 1,024 dimensions requires setting dimension: 1024 in the mapping. A mismatch between the model’s vector size and index mapping will cause ingestion failures, so it’s crucial to verify this value.After pasting the command into the Dev Tools console, choose Run. If successful, you’ll receive a response similar to the one shown in the following screenshot.
Now, you should have a new index (for example, named bedrock-kb-index
) on your domain with the preceding mapping. Make a note of the index name you created, the vector field name (embeddings
), the text field name (AMAZON_BEDROCK_TEXT_CHUNK
), and the metadata field name (AMAZON_BEDROCK_METADATA
). In the next steps, you’ll grant Amazon Bedrock permission to use this index and then plug these details into the Amazon Bedrock Knowledge Bases setup.
With the vector index successfully created, your OpenSearch Service domain is now ready to store and retrieve embedding vectors. Next, you’ll configure IAM roles and access policies to facilitate secure interaction between Amazon Bedrock and your OpenSearch Service domain.
Now that your OpenSearch Service domain and vector index are ready, it’s time to configure an Amazon Bedrock knowledge base to use this vector store. In this step, you will:
We will pause the knowledge base creation midway to update OpenSearch Service access policies before finalizing the setup.
To create the Amazon Bedrock knowledge base in the console, follow these steps. For detailed instructions, refer to Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases in the AWS documentation. The following steps provide a streamlined overview of the general process:
To configure OpenSearch Service Managed Cluster as the vector store, follow these steps:
You must not choose Create yet. Amazon Bedrock will be ready to create the knowledge base, but you need to configure OpenSearch Service access permissions first. Copy the ARN of the new IAM service role that Amazon Bedrock will use for this knowledge base (the console will display the role ARN you selected or just created). Keep this ARN handy and leave the Amazon Bedrock console open (pause the creation process here).
With the IAM service role ARN copied, configure fine-grained permissions in the OpenSearch dashboard. Fine-grained access control provides role-based permission management at a granular level (indices, documents, and fields), so that your Amazon Bedrock knowledge base has precisely controlled access. Follow these steps:
https://<your-domain-endpoint>/_dashboards/
bedrock-knowledgebase-role
.bedrock-kb-index
).To map the Amazon Bedrock IAM service role (copied earlier) to the newly created OpenSearch Service role, follow these steps:
bedrock-knowledgebase-role
).arn:aws:iam::<accountId>:role/service-role/BedrockKnowledgeBaseRole
). When mapping this IAM role to an OpenSearch Service role, the IAM role doesn’t need to exist in your AWS account at the time of mapping. You’re referencing its ARN to establish the association within the OpenSearch backend. This allows OpenSearch Service to recognize and authorize the role when it’s eventually created and used. Make sure that the ARN is correctly specified to facilitate proper permission mapping.With fine-grained permissions in place, return to the paused Amazon Bedrock console to finalize your knowledge base setup. Confirm that all OpenSearch Service domain details are correctly entered, including the domain endpoint, domain ARN, index name, vector field name, text field name, and metadata field name. Choose Create knowledge base.
Amazon Bedrock will use the configured IAM service role to securely connect to your OpenSearch Service domain. After the setup is complete, the knowledge base status should change to Available, confirming successful integration.
When integrating OpenSearch Service Managed Cluster with Amazon Bedrock Knowledge Bases, it’s important to understand how access control works across different layers.
For same-account configurations (where both the knowledge base and OpenSearch Service domain are in the same AWS account), no updates to the OpenSearch Service domain’s resource-based policy are required as long as fine-grained access control is enabled and your IAM role is correctly mapped. In this case, IAM permissions and fine-grained access control mappings are sufficient to authorize access. However, if the domain’s resource-based policy includes deny statements targeting your knowledge base service role or principals, access will be blocked—regardless of IAM or fine-grained access control settings. To avoid unintended failures, make sure the policy doesn’t explicitly restrict access to the Amazon Bedrock Knowledge Bases service role.
For cross-account access (when the IAM role used by Amazon Bedrock Knowledge Bases belongs to a different AWS account than the OpenSearch Service domain), you must include an explicit allow statement in the domain’s resource-based policy for the external role. Without this, access will be denied even if all other permissions are correctly configured.
To begin using your knowledge base, select your configured data source and initiate the sync process. This action starts the ingestion of your Amazon S3 data. After synchronization is complete, your knowledge base is ready for information retrieval.
Integrating Amazon Bedrock Knowledge Bases with OpenSearch Service Managed Cluster offers a powerful solution for vector storage and retrieval in AI applications. In this post, we walked you through the process of setting up an OpenSearch Service domain, configuring a vector index, and connecting it to an Amazon Bedrock knowledge base. With this setup, you’re now equipped to use the full potential of vector search capabilities in your AI-driven applications, enhancing your ability to process and retrieve information from large datasets efficiently.
Get started with Amazon Bedrock Knowledge Bases and let us know your thoughts in the comments section.
Hello, last week I shared this post: Wan 2.1 txt2img is amazing!. Although I think…
It’s no secret that most advanced artificial intelligence solutions today are predominantly based on impressively…
We revisit the problem of secure aggregation of high-dimensional vectors in a two-server system such…
Behind the Streams: Three Years Of Live at Netflix. Part 1.By Sergey Fedorov, Chris Pham, Flavio…
Embeddings are a cornerstone of modern semantic search and Retrieval Augmented Generation (RAG) applications. In…
A DeepMind study finds LLMs are both stubborn and easily swayed. This confidence paradox has…