Categories: FAANG

Create summaries of recordings using generative AI with Amazon Bedrock and Amazon Transcribe

ML 15930 image001

Meeting notes are a crucial part of collaboration, yet they often fall through the cracks. Between leading discussions, listening closely, and typing notes, it’s easy for key information to slip away unrecorded. Even when notes are captured, they can be disorganized or illegible, rendering them useless.

In this post, we explore how to use Amazon Transcribe and Amazon Bedrock to automatically generate clean, concise summaries of video or audio recordings. Whether it’s an internal team meeting, conference session, or earnings call, this approach can help you distill hours of content down to salient points.

We walk through a solution to transcribe a project team meeting and summarize the key takeaways with Amazon Bedrock. We also discuss how you can customize this solution for other common scenarios like course lectures, interviews, and sales calls. Read on to simplify and automate your note-taking process.

Solution overview

By combining Amazon Transcribe and Amazon Bedrock, you can save time, capture insights, and enhance collaboration. Amazon Transcribe is an automatic speech recognition (ASR) service that makes it straightforward to add speech-to-text capability to applications. It uses advanced deep learning technologies to accurately transcribe audio into text. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon with a single API, along with a broad set of capabilities you need to build generative AI applications. With Amazon Bedrock, you can easily experiment with a variety of top FMs, and privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG).

The solution presented in this post is orchestrated using an AWS Step Functions state machine that is triggered when you upload a recording to the designated Amazon Simple Storage Service (Amazon S3) bucket. Step Functions lets you create serverless workflows to orchestrate and connect components across AWS services. It handles the underlying complexity so you can focus on application logic. It’s useful for coordinating tasks, distributed processing, ETL (extract, transform, and load), and business process automation.

The following diagram illustrates the high-level solution architecture.

The solution workflow includes the following steps:

A user stores a recording in the S3 asset bucket.
This action triggers the Step Functions transcription and summarization state machine.
As part of the state machine, an AWS Lambda function is triggered, which transcribes the recording using Amazon Transcribe and stores the transcription in the asset bucket.
A second Lambda function retrieves the transcription and generates a summary using the Anthropic Claude model in Amazon Bedrock.
Lastly, a final Lambda function uses Amazon Simple Notification Service (Amazon SNS) to send a summary of the recording to the recipient.

This solution is supported in Regions where Anthropic Claude on Amazon Bedrock is available.

The state machine orchestrates the steps to perform the specific tasks. The following diagram illustrates the detailed process.

Prerequisites

Amazon Bedrock users need to request access to models before they are available for use. This is a one-time action. For this solution, you’ll need to enable access to the Anthropic Claude (not Anthropic Claude Instant) model in Amazon Bedrock. For more information, refer to Model access.

Deploy solution resources

The solution is deployed using an AWS CloudFormation template, found on the GitHub repo, to automatically provision the necessary resources in your AWS account. The template requires the following parameters:

Email address used to send summary – The summary will be sent to this address. You must acknowledge the initial Amazon SNS confirmation email before receiving additional notifications.
Summary instructions – These are the instructions given to the Amazon Bedrock model to generate the summary.

Run the solution

After you deploy the solution using AWS CloudFormation, complete the following steps:

Acknowledge the Amazon SNS email confirmation that you should receive a few moments after creating the CloudFormation stack.
On the AWS CloudFormation console, navigate to stack you just created.
On the stack’s Outputs tab, and look for the value associated with AssetBucketName; it will look something like summary-generator-assetbucket-xxxxxxxxxxxxx.
On the Amazon S3 console, navigate to your asset bucket.

This is where you’ll upload your recordings. Valid file formats are MP3, MP4, WAV, FLAC, AMR, OGG, and WebM.

Upload your recording to the recordings folder.

Uploading recordings will automatically trigger the Step Functions state machine. For this example, we use a sample team meeting recording in the sample-recording directory of the GitHub repository.

On the Step Functions console, navigate to the summary-generator state machine.
Choose the name of the state machine run with the status Running.

Here, you can watch the progress of the state machine as it processes the recording.

After it reaches its Success state, you should receive an emailed summary of the recording.

Alternatively, you can navigate to the S3 assets bucket and view the transcript there in the transcripts folder.

Review the summary

You will get the recording summary emailed to the address you provided when you created the CloudFormation stack. If you don’t receive the email in a few moments, make sure that you acknowledged the Amazon SNS confirmation email that you should have received after you created the stack and then upload the recording again, which will trigger the summary process.

This solution includes a mock team meeting recording that you can use to test the solution. The summary will look similar to the following example. Because of the nature of generative AI, however, your output will look a bit different, but the content should be close.

Here are the key points from the standup:

Joe finished reviewing the current state for task EDU1 and created a new task to develop the future state. That new task is in the backlog to be prioritized. He’s now starting EDU2 but is blocked on resource selection.

Rob created a tagging strategy for SLG1 based on best practices, but may need to coordinate with other teams who have created their own strategies, to align on a uniform approach. A new task was created to coordinate tagging strategies.

Rob has made progress debugging for SLG2 but may need additional help. This task will be moved to Sprint 2 to allow time to get extra resources.

Next Steps:

Joe to continue working on EDU2 as able until resource selection is decided

New task to be prioritized to coordinate tagging strategies across teams

SLG2 moved to Sprint 2

Standups moving to Mondays starting next week

Expand the solution

Now that you have a working solution, here are some potential ideas to customize the solution for your specific use cases:

Try altering the process to fit your available source content and desired outputs:
- For situations where transcripts are available, create an alternate Step Functions workflow to ingest existing text-based or PDF-based transcriptions.
- Instead of using Amazon SNS to notify recipients via email, you can use it to send the output to a different endpoint, such as a team collaboration site, or to the team’s chat channel.
Try changing the summary instructions CloudFormation stack parameter provided to Amazon Bedrock to produce outputs specific to your use case (this is the generative AI prompt):
- When summarizing a company’s earnings call, you could have the model focus on potential promising opportunities, areas of concern, and things that you should continue to monitor.
- If you are using this to summarize a course lecture, the model could identify upcoming assignments, summarize key concepts, list facts, and filter out any small talk from the recording.
For the same recording, create different summaries for different audiences:
- Engineers’ summaries focus on design decisions, technical challenges, and upcoming deliverables.
- Project managers’ summaries focus on timelines, costs, deliverables, and action items.
- Project sponsors get a brief update on project status and escalations.
- For longer recordings, try generating summaries for different levels of interest and time commitment. For example, create a single sentence, single paragraph, single page, or in-depth summary. In addition to the prompt, you may want to adjust the max_tokens_to_sample parameter to accommodate different content lengths.

Clean up

To clean up the solution, delete the CloudFormation stack that you created earlier. Note that deleting the stack will not delete the asset bucket. If you no longer need the recordings or transcripts, you can delete this bucket separately. Amazon Transcribe will automatically delete transcription jobs after 90 days, but you can delete these manually before then.

Conclusion

In this post, we explored how to use Amazon Transcribe and Amazon Bedrock to automatically generate clean, concise summaries of video or audio recordings. We encourage you to continue evaluating Amazon Bedrock, Amazon Transcribe, and other AWS AI services, like Amazon Textract, Amazon Translate, and Amazon Rekognition, to see how they can help meet your business objectives.

About the Authors

Rob Barnes is a principal consultant for AWS Professional Services. He works with our customers to address security and compliance requirements at scale in complex, multi-account AWS environments through automation.

Jason Stehle is a Senior Solutions Architect at AWS, based in the New England area. He works with customers to align AWS capabilities with their greatest business challenges. Outside of work, he spends his time building things and watching comic book movies with his family.