In today’s data-intensive business landscape, organizations face the challenge of extracting valuable insights from diverse data sources scattered across their infrastructure. Whether it’s structured data in databases or unstructured content in document repositories, enterprises often struggle to efficiently query and use this wealth of information.
In this post, we explore how you can use Amazon Q Business, the AWS generative AI-powered assistant, to build a centralized knowledge base for your organization, unifying structured and unstructured datasets from different sources to accelerate decision-making and drive productivity. The solution combines data from an Amazon Aurora MySQL-Compatible Edition database and data stored in an Amazon Simple Storage Service (Amazon S3) bucket.
Amazon Q Business is a fully managed, generative AI-powered assistant that helps enterprises unlock the value of their data and knowledge. The key to using the full potential of Amazon Q lies in its ability to seamlessly integrate and query multiple data sources, from structured databases to unstructured content stores. In this solution, we use Amazon Q to build a comprehensive knowledge base that combines sales-related data from an Aurora MySQL database and sales documents stored in an S3 bucket. Aurora MySQL-Compatible is a fully managed, MySQL-compatible, relational database engine that combines the speed and reliability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases. Amazon S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance.
This custom knowledge base that connects these diverse data sources enables Amazon Q to seamlessly respond to a wide range of sales-related questions using the chat interface. The following diagram illustrates the solution architecture.
For this walkthrough, you should have the following prerequisites:
Establishing a VPC provides a secure, isolated network environment for hosting the data sources that Amazon Q Business will access to index. In this post, we use an Aurora MySQL database in a private subnet, and Amazon Q Business accesses the private DB instance in a secure manner using an interface VPC endpoint.
Complete the following steps:
Each interface endpoint is represented by one or more elastic network interfaces in your subnets, which is then used by Amazon Q Business to connect to the private database.
Complete the following steps to create an Aurora MySQL database to host the structured sales data:
In this post, you connect to the private DB instance from the MySQL Workbench client on your local machine through an EC2 bastion host. Launch the EC2 instance in the public subnet of the VPC you configured. The security group attached to this EC2 bastion host instance should be configured to allow SSH traffic (port 22) from your local machine’s IP address. To facilitate the connection between the EC2 bastion host and the Aurora MySQL database, the security group for the Aurora MySQL database should have an inbound rule to allow MySQL traffic (port 3306) from the security group of the EC2 bastion host. Conversely, the security group for the EC2 bastion host should have an outbound rule to allow traffic to the security group of the Aurora MySQL database on port 3306. Refer to Controlling access with security groups for more details.
An Amazon Q Business application requires you to use IAM Identity Center to manage user access. IAM Identity Center is a single place where you can assign your workforce users, also known as workforce identities, to provide consistent access to multiple AWS accounts and applications. In this post, we use IAM Identity Center as the SAML 2.0-aligned identity provider (IdP). Make sure you have enabled an IAM Identity Center instance, provisioned at least one user, and provided each user with a valid email address. The Amazon Q Business application needs to be in the same Region as the IAM Identity Center instance. For more information on enabling users in IAM Identity Center, see Add users to your Identity Center directory.
Create a S3 bucket in the us-east-1 Region with the default settings and create a folder with a name of your choice inside the bucket.
In this post, we use two sample datasets: a total sales dataset CSV file and a sales target document in PDF format. The total sales dataset contains information about orders placed by customers located in various geographical locations, through different sales channels. The sales document contains information about sales targets for the year for each of the sales channel. Complete the steps in the section below to load both datasets.
In the Amazon Q Business application, you create two indexes for the same Aurora MySQL table: one on the total sales dataset and another on an aggregated view of the total sales data, to cater to the different type of queries. Complete the following steps:
This enables you to manage and interact with your database resources directly from your local MySQL Workbench client.
If you encounter the error LOAD DATA LOCAL INFILE file request rejected due to restrictions on access when running the statements in MySQL Workbench 8.0, you might need to edit the connection. On the Connection tab, go to the Advanced sub-tab, and in the Others field, add the line OPT_LOCAL_INFILE=1
and start a new query tab after testing the connection.
This should return 7,991 rows.
The following screenshot shows the database table schema and the sample data in the table.
Download the sample file 2020_Sales_Target.pdf
in your local environment and upload it to the S3 bucket you created. This sales target document contains information about the sales target for four sales channels and looks like the following screenshot.
Complete the following steps to create an Amazon Q application:
sales_analyzer
).Complete the following steps to configure Amazon Q to connect to Aurora MySQL-Compatible:
aurora_mysql_sales
).<databasename>.<ID>.<region>.rds.amazonaws.com
).You can obtain the endpoint on the Amazon RDS console for the instance on the Connectivity & security tab.
3306
.sales
).This select statement returns a primary key column, a document title column, and a text column that serves your document body for Amazon Q to answer questions. Make sure you don’t put ; at the end of the query.
order_number
.sales_channel
.sales_details
.This process may take a few minutes to complete. After the aurora_mysql_sales
data source is added, you will be redirected to the Connect data sources page.
aggregated_sales
, for the same database but with the following details in the Sync scope This data source will be used by Amazon Q for answering questions on aggregated sales. scoy_id
.sales_channel
.sales_aggregates
.After adding the aggregated_sales
data source, you will be redirected to the Connect data sources page again.
Complete the following steps to configure Amazon Q to connect to Amazon S3:
s3_sales_targets
) and a description.After few minutes, the application will be created and you will be taken to the Applications page on the Amazon Q Business console.
Choose the name of your application and navigate to the Data sources section. For each of the three data sources, select the data source and choose Sync now. It will take several minutes to complete. After the sources have synced, you should see the Last sync status show as Completed.
At this point, you have created an Amazon Q application, synced the data source, and deployed the web experience. You can customize your web experience to make it more intuitive to your application users.
You can now test the application by asking different questions, as shown in the following screenshot. You can observe in the following question that the channel names were fetched from the Amazon S3 sales target PDF.
The following screenshots show more example interactions.
The answer in the preceding example was derived from the two sources: the S3 bucket and the Aurora database. You can verify the output by cross-referencing the PDF, which has a target as $12 million for the in-store sales channel in 2020. The following SQL shows the actual sales achieved in 2020 for the same channel:
As seen from the sales target PDF data, the 2020 sales target for the distributor sales channel was $7 million.
The following SQL in the Aurora MySQL database shows the actual sales achieved in 2020 for the same channel:
The following screenshots show additional questions.
You can verify the preceding answers with the following SQL:
To avoid incurring future charges, clean up any resources you created as part of this solution, including the Amazon Q Business application:
In this post, we demonstrated how organizations can use Amazon Q to build a unified knowledge base that integrates structured data from an Aurora MySQL database and unstructured data from an S3 bucket. By connecting these disparate data sources, Amazon Q enables you to seamlessly query information from two data sources and gain valuable insights that drive better decision-making.
We encourage you to try this solution and share your experience in the comments. Additionally, you can explore the many other data sources that Amazon Q for Business can seamlessly integrate with, empowering you to build robust and insightful applications.
Learn how AI can boost efficiency on your retail marketing efforts in 2025 and beyond.
Recommender systems enhance user experiences in Internet-based applications by recommending items tailored to individual preferences…
Your next big customer doesn't speak your language. In fact, 40% of global consumers won't…
OpenScholar, an innovative AI system by Allen Institute for AI and University of Washington, revolutionizes…
Chip production delays and a rumored overheating issue haven’t slowed down Nvidia, which reported another…
In order to use remote locations to record and assess the behavior of wildlife and…