Categories: FAANG

Getting started with computer use in Amazon Bedrock Agents

infrastructure

Computer use is a breakthrough capability from Anthropic that allows foundation models (FMs) to visually perceive and interpret digital interfaces. This capability enables Anthropic’s Claude models to identify what’s on a screen, understand the context of UI elements, and recognize actions that should be performed such as clicking buttons, typing text, scrolling, and navigating between applications. However, the model itself doesn’t execute these actions—it requires an orchestration layer to safely implement the supported actions.

Today, we’re announcing computer use support within Amazon Bedrock Agents using Anthropic’s Claude 3.5 Sonnet V2 and Anthropic’s Claude Sonnet 3.7 models on Amazon Bedrock. This integration brings Anthropic’s visual perception capabilities as a managed tool within Amazon Bedrock Agents, providing you with a secure, traceable, and managed way to implement computer use automation in your workflows.

Organizations across industries struggle with automating repetitive tasks that span multiple applications and systems of record. Whether processing invoices, updating customer records, or managing human resource (HR) documents, these workflows often require employees to manually transfer information between different systems – a process that’s time-consuming, error-prone, and difficult to scale.

Traditional automation approaches require custom API integrations for each application, creating significant development overhead. Computer use capabilities change this paradigm by allowing machines to perceive existing interfaces just as humans.

In this post, we create a computer use agent demo that provides the critical orchestration layer that transforms computer use from a perception capability into actionable automation. Without this orchestration layer, computer use would only identify potential actions without executing them. The computer use agent demo powered by Amazon Bedrock Agents provides the following benefits:

Secure execution environment – Execution of computer use tools in a sandbox environment with limited access to the AWS ecosystem and the web. It is crucial to note that currently Amazon Bedrock Agent does not provide a sandbox environment
Comprehensive logging – Ability to track each action and interaction for auditing and debugging
Detailed tracing capabilities – Visibility into each step of the automated workflow
Simplified testing and experimentation – Reduced risk when working with this experimental capability through managed controls
Seamless orchestration – Coordination of complex workflows across multiple systems without custom code

This integration combines Anthropic’s perceptual understanding of digital interfaces with the orchestration capabilities of Amazon Bedrock Agents, creating a powerful agent for automating complex workflows across applications. Rather than build custom integrations for each system, developers can now create agents that perceive and interact with existing interfaces in a managed, secure way.

With computer use, Amazon Bedrock Agents can automate tasks through basic GUI actions and built-in Linux commands. For example, your agent could take screenshots, create and edit text files, and run built-in Linux commands. Using Amazon Bedrock Agents and compatible Anthropic’s Claude models, you can use the following action groups:

Computer tool – Enables interactions with user interfaces (clicking, typing, scrolling)
Text editor tool – Provides capabilities to edit and manipulate files
Bash – Allows execution of built-in Linux commands

Solution overview

An example computer use workflow consists of the following steps:

Create an Amazon Bedrock agent and use natural language to describe what the agent should do and how it should interact with users, for example: “You are computer use agent capable of using Firefox web browser for web search.”
Add the Amazon Bedrock Agents supported computer use action groups to your agent using CreateAgentActionGroup API.
Invoke the agent with a user query that requires computer use tools, for example, “What is Amazon Bedrock, can you search the web?”
The Amazon Bedrock agent uses the tool definitions at its disposal and decides to use the computer action group to click a screenshot of the environment. Using the return control capability of Amazon Bedrock Agents, the agent the responds with the tool or tools that it wants to execute. The return control capability is required for using computer use with Amazon Bedrock Agents.
The workflow parses the agent response and executes the tool returned in a sandbox environment. The output is given back to the Amazon Bedrock agent for further processing.
The Amazon Bedrock agent continues to respond with tools at its disposal until the task is complete.

You can recreate this example in the us-west-2 AWS Region with the AWS Cloud Development Kit (AWS CDK) by following the instructions in the GitHub repository. This demo deploys a containerized application using AWS Fargate across two Availability Zones in the us-west-2 Region. The infrastructure operates within a virtual private cloud (VPC) containing public subnets in each Availability Zone, with an internet gateway providing external connectivity. The architecture is complemented by essential supporting services, including AWS Key Management Service (AWS KMS) for security and Amazon CloudWatch for monitoring, creating a resilient, serverless container environment that alleviates the need to manage underlying infrastructure while maintaining robust security and high availability.

The following diagram illustrates the solution architecture.

At the core of our solution are two Fargate containers managed through Amazon Elastic Container Service (Amazon ECS), each protected by its own security group. The first is our orchestration container, which not only handles the communication between Amazon Bedrock Agents and end users, but also orchestrates the workflow that enables tool execution. The second is our environment container, which serves as a secure sandbox where the Amazon Bedrock agent can safely run its computer use tools. The environment container has limited access to the rest of the ecosystem and the internet. We utilize service discovery to connect Amazon ECS services with DNS names.

The orchestration container includes the following components:

Streamlit UI – The Streamlit UI that facilitates interaction between the end user and computer use agent
Return control loop – The workflow responsible for parsing the tools that the agent wants to execute and returning the output of these tools

The environment container includes the following components:

UI and pre-installed applications – A lightweight UI and pre-installed Linux applications like Firefox that can be used to complete the user’s tasks
Tool implementation – Code that can execute computer use tool in the environment like “screenshot” or “double-click”
Quart (RESTful) JSON API – An orchestration container that uses Quart to execute tools in a sandbox environment

The following diagram illustrates these components.

Prerequisites

AWS Command Line Interface (CLI), follow instructions here. Make sure to setup credentials, follow instructions here.
Require Python 3.11 or later.
Require Node.js 14.15.0 or later.
AWS CDK CLI, follow instructions here.
Enable model access for Anthropic’s Claude Sonnet 3.5 V2 and for Anthropic’s Claude Sonnet 3.7.
Boto3 version >= 1.37.10.

Create an Amazon Bedrock agent with computer use

You can use the following code sample to create a simple Amazon Bedrock agent with computer, bash, and text editor action groups. It is crucial to provide a compatible action group signature when using Anthropic’s Claude 3.5 Sonnet V2 and Anthropic’s Claude 3.7 Sonnet as highlighted here.

Model	Action Group Signature
Anthropic’s Claude 3.5 Sonnet V2	computer_20241022 text_editor_20241022 bash_20241022
Anthropic’s Claude 3.7 Sonnet	computer_20250124 text_editor_20250124 bash_20250124

import boto3
import time

# Step 1: Create the bedrock agent client

bedrock_agent = boto3.client("bedrock-agent", region_name="us-west-2")

# Step 2: Create an agent

create_agent_response = create_agent_response = bedrock_agent.create_agent(
        agentResourceRoleArn=agent_role_arn, # Amazon Bedrock Agent execution role
        agentName="computeruse",
        description="""Example agent for computer use. 
    This agent should only operate on 
    Sandbox environments with limited privileges.""",
        foundationModel="us.anthropic.claude-3-7-sonnet-20250219-v1:0",      
  instruction="""You are computer use agent capable of using Firefox 
                 web browser for web search.""",
)

time.sleep(30) # wait for agent to be created

# Step 3.1: Create and attach computer action group

bedrock_agent.create_agent_action_group(
    actionGroupName="ComputerActionGroup",
    actionGroupState="ENABLED",
    agentId=create_agent_response["agent"]["agentId"],
    agentVersion="DRAFT",
    parentActionGroupSignature="ANTHROPIC.Computer",
    parentActionGroupSignatureParams={
        "type": "computer_20250124",
        "display_height_px": "768",
        "display_width_px": "1024",
        "display_number": "1",
    },
)

# Step 3.2: Create and attach bash action group

bedrock_agent.create_agent_action_group(
    actionGroupName="BashActionGroup",
    actionGroupState="ENABLED",
    agentId=create_agent_response["agent"]["agentId"],
    agentVersion="DRAFT",
    parentActionGroupSignature="ANTHROPIC.Bash",
    parentActionGroupSignatureParams={
        "type": "bash_20250124",
    },
)

# Step 3.3: Create and attach text editor action group

bedrock_agent.create_agent_action_group(
    actionGroupName="TextEditorActionGroup",
    actionGroupState="ENABLED",
    agentId=create_agent_response["agent"]["agentId"],
    agentVersion="DRAFT",
    parentActionGroupSignature="ANTHROPIC.TextEditor",
    parentActionGroupSignatureParams={
        "type": "text_editor_20250124",
    },
)

# Step 3.4 Create Weather Action Group

bedrock_agent.create_agent_action_group(
        actionGroupName="WeatherActionGroup",
        agentId=create_agent_response["agent"]["agentId"],
        agentVersion="DRAFT",
        actionGroupExecutor = {
            'customControl': 'RETURN_CONTROL',
        },
        functionSchema = {
            'functions': [
                {
                    "name": "get_current_weather",
                    "description": "Get the current weather in a given location.",
                    "parameters": {
                        "location": {
                            "type": "string",
                            "description": "The city, e.g., San Francisco",
                            "required": True,
                        },
                        "unit": {
                            "type": "string",
                            "description": 'The unit to use, e.g., 
         fahrenheit or celsius. Defaults to "fahrenheit"',
                            "required": False,
                        },
                    },
                    "requireConfirmation": "DISABLED",
                }
            ]
        },
)
time.sleep(10)
# Step 4: Prepare agent

bedrock_agent.prepare_agent(agentId=create_agent_response["agent"]["agentId"])

Example use case

In this post, we demonstrate an example where we use Amazon Bedrock Agents with the computer use capability to complete a web form. In the example, the computer use agent can also switch Firefox tabs to interact with a customer relationship management (CRM) agent to get the required information to complete the form. Although this example uses a sample CRM application as the system of record, the same approach works with Salesforce, SAP, Workday, or other systems of record with the appropriate authentication frameworks in place.

In the demonstrated use case, you can observe how well the Amazon Bedrock agent performed with computer use tools. Our implementation completed the customer ID, customer name, and email by visually examining the excel data. However, for the overview, it decided to select the cell and copy the data, because the information wasn’t completely visible on the screen. Finally, the CRM agent was used to get additional information on the customer.

Best practices

The following are some ways you can improve the performance for your use case:

Implement Security Groups, Network Access Control Lists (NACLs), and Amazon Route 53 Resolver DNS Firewall domain lists to control access to the sandbox environment.
Apply AWS Identity and Access Management (IAM) and the principle of least privilege to assign limited permissions to the sandbox environment.
When providing the Amazon Bedrock agent with instructions, be concise and direct. Specify simple, well-defined tasks and provide explicit instructions for each step.
Understand computer use limitations as highlighted by Anthropic here.
Complement return of control with user confirmation to help safeguard your application from malicious prompt injections by requesting confirmation from your users before invoking a computer use tool.
Use multi-agent collaboration and computer use with Amazon Bedrock Agents to automate complex workflows.
Implement safeguards by filtering harmful multimodal content based on your responsible AI policies for your application by associating Amazon Bedrock Guardrails with your agent.

Considerations

The computer use feature is made available to you as a beta service as defined in the AWS Service Terms. It is subject to your agreement with AWS and the AWS Service Terms, and the applicable model EULA. Computer use poses unique risks that are distinct from standard API features or chat interfaces. These risks are heightened when using the computer use feature to interact with the internet. To minimize risks, consider taking precautions such as:

Operate computer use functionality in a dedicated virtual machine or container with minimal privileges to minimize direct system exploits or accidents
To help prevent information theft, avoid giving the computer use API access to sensitive accounts or data
Limit the computer use API’s internet access to required domains to reduce exposure to malicious content
To enforce proper oversight, keep a human in the loop for sensitive tasks (such as making decisions that could have meaningful real-world consequences) and for anything requiring affirmative consent (such as accepting cookies, executing financial transactions, or agreeing to terms of service)

Any content that you enable Anthropic’s Claude to see or access can potentially override instructions or cause the model to make mistakes or perform unintended actions. Taking proper precautions, such as isolating Anthropic’s Claude from sensitive surfaces, is essential – including to avoid risks related to prompt injection. Before enabling or requesting permissions necessary to enable computer use features in your own products, inform end users of any relevant risks, and obtain their consent as appropriate.

Clean up

When you are done using this solution, make sure to clean up all the resources. Follow the instructions in the provided GitHub repository.

Conclusion

Organizations across industries face significant challenges with cross-application workflows that traditionally require manual data entry or complex custom integrations. The integration of Anthropic’s computer use capability with Amazon Bedrock Agents represents a transformative approach to these challenges.

By using Amazon Bedrock Agents as the orchestration layer, organizations can alleviate the need for custom API development for each application, benefit from comprehensive logging and tracing capabilities essential for enterprise deployment, and implement automation solutions quickly.

As you begin exploring computer use with Amazon Bedrock Agents, consider workflows in your organization that could benefit from this approach. From invoice processing to customer onboarding, HR documentation to compliance reporting, the potential applications are vast and transformative.

We’re excited to see how you will use Amazon Bedrock Agents with the computer use capability to securely streamline operations and reimagine business processes through AI-driven automation.

Resources

To learn more, refer to the following resources:

About the Authors

Eashan Kaushik is a Specialist Solutions Architect AI/ML at Amazon Web Services. He is driven by creating cutting-edge generative AI solutions while prioritizing a customer-centric approach to his work. Before this role, he obtained an MS in Computer Science from NYU Tandon School of Engineering. Outside of work, he enjoys sports, lifting, and running marathons.

Maira Ladeira Tanke is a Tech Lead for Agentic workloads in Amazon Bedrock at AWS, where she enables customers on their journey to develop autonomous AI systems. With over 10 years of experience in AI/ML. At AWS, Maira partners with enterprise customers to accelerate the adoption of agentic applications using Amazon Bedrock, helping organizations harness the power of foundation models to drive innovation and business transformation. In her free time, Maira enjoys traveling, playing with her cat, and spending time with her family someplace warm.

Raj Pathak is a Principal Solutions Architect and Technical advisor to Fortune 50 and Mid-Sized FSI (Banking, Insurance, Capital Markets) customers across Canada and the United States. Raj specializes in Machine Learning with applications in Generative AI, Natural Language Processing, Intelligent Document Processing, and MLOps.

Adarsh Srikanth is a Software Development Engineer at Amazon Bedrock, where he develops AI agent services. He holds a master’s degree in computer science from USC and brings three years of industry experience to his role. He spends his free time exploring national parks, discovering new hiking trails, and playing various racquet sports.

Abishek Kumar is a Senior Software Engineer at Amazon, bringing over 6 years of valuable experience across both retail and AWS organizations. He has demonstrated expertise in developing generative AI and machine learning solutions, specifically contributing to key AWS services including SageMaker Autopilot, SageMaker Canvas, and AWS Bedrock Agents. Throughout his career, Abishek has shown passion for solving complex problems and architecting large-scale systems that serve millions of customers worldwide. When not immersed in technology, he enjoys exploring nature through hiking and traveling adventures with his wife.

Krishna Gourishetti is a Senior Software Engineer for the Bedrock Agents team in AWS. He is passionate about building scalable software solutions that solve customer problems. In his free time, Krishna loves to go on hikes.

Announcing Anthropic’s upgraded Claude 3.5 Sonnet on Vertex AI

At Google Cloud, we’ve taken an open approach in building our Vertex AI platform — to provide the most powerful AI tools available along with unparalleled choice and flexibility. That’s why Vertex AI delivers access to over 160 models — including first-party, open-source, and third-party models — so you can…

October 23, 2024

In "FAANG"