Picture1 5 1024x478 1
Recordings of business meetings, interviews, and customer interactions have become essential for preserving important information. However, transcribing and summarizing these recordings manually is often time-consuming and labor-intensive. With the progress in generative AI and automatic speech recognition (ASR), automated solutions have emerged to make this process faster and more efficient.
Protecting personally identifiable information (PII) is a vital aspect of data security, driven by both ethical responsibilities and legal requirements. In this post, we demonstrate how to use the Open AI Whisper foundation model (FM) Whisper Large V3 Turbo, available in Amazon Bedrock Marketplace, which offers access to over 140 models through a dedicated offering, to produce near real-time transcription. These transcriptions are then processed by Amazon Bedrock for summarization and redaction of sensitive information.
Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, DeepSeek, Luma, Meta, Mistral AI, poolside (coming soon), Stability AI, and Amazon Nova through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Additionally, you can use Amazon Bedrock Guardrails to automatically redact sensitive information, including PII, from the transcription summaries to support compliance and data protection needs.
In this post, we walk through an end-to-end architecture that combines a React-based frontend with Amazon Bedrock, AWS Lambda, and AWS Step Functions to orchestrate the workflow, facilitating seamless integration and processing.
The solution highlights the power of integrating serverless technologies with generative AI to automate and scale content processing workflows. The user journey begins with uploading a recording through a React frontend application, hosted on Amazon CloudFront and backed by Amazon Simple Storage Service (Amazon S3) and Amazon API Gateway. When the file is uploaded, it triggers a Step Functions state machine that orchestrates the core processing steps, using AI models and Lambda functions for seamless data flow and transformation. The following diagram illustrates the solution architecture.
The workflow consists of the following steps:
The following diagram illustrates the state machine workflow.
The Step Functions state machine orchestrates a series of tasks to transcribe, summarize, and redact sensitive information from uploaded audio/video recordings:
Before you start, make sure that you have the following prerequisites in place:
For instructions for creating guardrails in Amazon Bedrock, refer to Create a guardrail. For details on detecting and redacting PII, see Remove PII from conversations by using sensitive information filters. Configure your guardrail with the following key settings:
After you deploy the guardrail, note the Amazon Resource Name (ARN), and you will be using this when deploys the model.
Complete the following steps to deploy the Whisper Large V3 Turbo model:
This creates a new AWS Identity and Access Management IAM role and deploys the model.
You can choose Marketplace deployments in the navigation pane, and in the Managed deployments section, you can see the endpoint status as Creating. Wait for the endpoint to finish deployment and the status to change to In Service, then copy the Endpoint Name, and you will be using this when deploying the
In the GitHub repo, follow the instructions in the README file to clone the repository, then deploy the frontend and backend infrastructure.
We use the AWS Cloud Development Kit (AWS CDK) to define and deploy the infrastructure. The AWS CDK code deploys the following resources:
The backend is composed of a sequence of Lambda functions, each handling a specific stage of the audio processing pipeline:
Let’s examine some of the key components:
The transcription Lambda function uses the Whisper model to convert audio files to text:
def transcribe_with_whisper(audio_chunk, endpoint_name):
# Convert audio to hex string format
hex_audio = audio_chunk.hex()
# Create payload for Whisper model
payload = {
"audio_input": hex_audio,
"language": "english",
"task": "transcribe",
"top_p": 0.9
}
# Invoke the SageMaker endpoint running Whisper
response = sagemaker_runtime.invoke_endpoint(
EndpointName=endpoint_name,
ContentType='application/json',
Body=json.dumps(payload)
)
# Parse the transcription response
response_body = json.loads(response['Body'].read().decode('utf-8'))
transcription_text = response_body['text']
return transcription_text
We use Amazon Bedrock to generate concise summaries from the transcriptions:
def generate_summary(transcription):
# Format the prompt with the transcription
prompt = f"{transcription}nnGive me the summary, speakers, key discussions, and action items with owners"
# Call Bedrock for summarization
response = bedrock_runtime.invoke_model(
modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
body=json.dumps({
"prompt": prompt,
"max_tokens_to_sample": 4096,
"temperature": 0.7,
"top_p": 0.9,
})
)
# Extract and return the summary
result = json.loads(response.get('body').read())
return result.get('completion')
A critical component of our solution is the automatic redaction of PII. We implemented this using Amazon Bedrock Guardrails to support compliance with privacy regulations:
def apply_guardrail(bedrock_runtime, content, guardrail_id):
# Format content according to API requirements
formatted_content = [{"text": {"text": content}}]
# Call the guardrail API
response = bedrock_runtime.apply_guardrail(
guardrailIdentifier=guardrail_id,
guardrailVersion="DRAFT",
source="OUTPUT", # Using OUTPUT parameter for proper flow
content=formatted_content
)
# Extract redacted text from response
if 'action' in response and response['action'] == 'GUARDRAIL_INTERVENED':
if len(response['outputs']) > 0:
output = response['outputs'][0]
if 'text' in output and isinstance(output['text'], str):
return output['text']
# Return original content if redaction fails
return content
When PII is detected, it’s replaced with type indicators (for example, {PHONE} or {EMAIL}), making sure that summaries remain informative while protecting sensitive data.
To manage the complex processing pipeline, we use Step Functions to orchestrate the Lambda functions:
{
"Comment": "Audio Summarization Workflow",
"StartAt": "TranscribeAudio",
"States": {
"TranscribeAudio": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "WhisperTranscriptionFunction",
"Payload": {
"bucket": "$.bucket",
"key": "$.key"
}
},
"Next": "IdentifySpeakers"
},
"IdentifySpeakers": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "SpeakerIdentificationFunction",
"Payload": {
"Transcription.$": "$.Payload"
}
},
"Next": "GenerateSummary"
},
"GenerateSummary": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "BedrockSummaryFunction",
"Payload": {
"SpeakerIdentification.$": "$.Payload"
}
},
"End": true
}
}
}
This workflow makes sure each step completes successfully before proceeding to the next, with automatic error handling and retry logic built in.
After you have successfully completed the deployment, you can use the CloudFront URL to test the solution functionality.
Security is a critical aspect of this solution, and we’ve implemented several best practices to support data protection and compliance:
To prevent unnecessary charges, make sure to delete the resources provisioned for this solution when you’re done:
cdk destroy
, which deletes the AWS infrastructure.This serverless audio summarization solution demonstrates the benefits of combining AWS services to create a sophisticated, secure, and scalable application. By using Amazon Bedrock for AI capabilities, Lambda for serverless processing, and CloudFront for content delivery, we’ve built a solution that can handle large volumes of audio content efficiently while helping you align with security best practices.
The automatic PII redaction feature supports compliance with privacy regulations, making this solution well-suited for regulated industries such as healthcare, finance, and legal services where data security is paramount. To get started, deploy this architecture within your AWS environment to accelerate your audio processing workflows.
Fully made with open-source tools within ComfyUI: - Image: UltraReal Finetune (Flux 1 Dev) +…
Missing values appear more often than not in many real-world datasets.
I must say, with the ongoing hype around machine learning, a lot of people jump…
A refresher on the most common misconceptions about Palantir, what we do, and how we workEditor’s…
The pace of innovation in open-source AI is breathtaking, with models like Meta's Llama4 and…
Should talking to an AI chatbot be protected and privileged information, like talking to a…