infrastructure
Computer use is a breakthrough capability from Anthropic that allows foundation models (FMs) to visually perceive and interpret digital interfaces. This capability enables Anthropic’s Claude models to identify what’s on a screen, understand the context of UI elements, and recognize actions that should be performed such as clicking buttons, typing text, scrolling, and navigating between applications. However, the model itself doesn’t execute these actions—it requires an orchestration layer to safely implement the supported actions.
Today, we’re announcing computer use support within Amazon Bedrock Agents using Anthropic’s Claude 3.5 Sonnet V2 and Anthropic’s Claude Sonnet 3.7 models on Amazon Bedrock. This integration brings Anthropic’s visual perception capabilities as a managed tool within Amazon Bedrock Agents, providing you with a secure, traceable, and managed way to implement computer use automation in your workflows.
Organizations across industries struggle with automating repetitive tasks that span multiple applications and systems of record. Whether processing invoices, updating customer records, or managing human resource (HR) documents, these workflows often require employees to manually transfer information between different systems – a process that’s time-consuming, error-prone, and difficult to scale.
Traditional automation approaches require custom API integrations for each application, creating significant development overhead. Computer use capabilities change this paradigm by allowing machines to perceive existing interfaces just as humans.
In this post, we create a computer use agent demo that provides the critical orchestration layer that transforms computer use from a perception capability into actionable automation. Without this orchestration layer, computer use would only identify potential actions without executing them. The computer use agent demo powered by Amazon Bedrock Agents provides the following benefits:
This integration combines Anthropic’s perceptual understanding of digital interfaces with the orchestration capabilities of Amazon Bedrock Agents, creating a powerful agent for automating complex workflows across applications. Rather than build custom integrations for each system, developers can now create agents that perceive and interact with existing interfaces in a managed, secure way.
With computer use, Amazon Bedrock Agents can automate tasks through basic GUI actions and built-in Linux commands. For example, your agent could take screenshots, create and edit text files, and run built-in Linux commands. Using Amazon Bedrock Agents and compatible Anthropic’s Claude models, you can use the following action groups:
An example computer use workflow consists of the following steps:
You can recreate this example in the us-west-2 AWS Region with the AWS Cloud Development Kit (AWS CDK) by following the instructions in the GitHub repository. This demo deploys a containerized application using AWS Fargate across two Availability Zones in the us-west-2 Region. The infrastructure operates within a virtual private cloud (VPC) containing public subnets in each Availability Zone, with an internet gateway providing external connectivity. The architecture is complemented by essential supporting services, including AWS Key Management Service (AWS KMS) for security and Amazon CloudWatch for monitoring, creating a resilient, serverless container environment that alleviates the need to manage underlying infrastructure while maintaining robust security and high availability.
The following diagram illustrates the solution architecture.
At the core of our solution are two Fargate containers managed through Amazon Elastic Container Service (Amazon ECS), each protected by its own security group. The first is our orchestration container, which not only handles the communication between Amazon Bedrock Agents and end users, but also orchestrates the workflow that enables tool execution. The second is our environment container, which serves as a secure sandbox where the Amazon Bedrock agent can safely run its computer use tools. The environment container has limited access to the rest of the ecosystem and the internet. We utilize service discovery to connect Amazon ECS services with DNS names.
The orchestration container includes the following components:
The environment container includes the following components:
The following diagram illustrates these components.
You can use the following code sample to create a simple Amazon Bedrock agent with computer, bash, and text editor action groups. It is crucial to provide a compatible action group signature when using Anthropic’s Claude 3.5 Sonnet V2 and Anthropic’s Claude 3.7 Sonnet as highlighted here.
Model | Action Group Signature |
Anthropic’s Claude 3.5 Sonnet V2 | computer_20241022 text_editor_20241022 bash_20241022 |
Anthropic’s Claude 3.7 Sonnet | computer_20250124 text_editor_20250124 bash_20250124 |
In this post, we demonstrate an example where we use Amazon Bedrock Agents with the computer use capability to complete a web form. In the example, the computer use agent can also switch Firefox tabs to interact with a customer relationship management (CRM) agent to get the required information to complete the form. Although this example uses a sample CRM application as the system of record, the same approach works with Salesforce, SAP, Workday, or other systems of record with the appropriate authentication frameworks in place.
In the demonstrated use case, you can observe how well the Amazon Bedrock agent performed with computer use tools. Our implementation completed the customer ID, customer name, and email by visually examining the excel data. However, for the overview, it decided to select the cell and copy the data, because the information wasn’t completely visible on the screen. Finally, the CRM agent was used to get additional information on the customer.
The following are some ways you can improve the performance for your use case:
The computer use feature is made available to you as a beta service as defined in the AWS Service Terms. It is subject to your agreement with AWS and the AWS Service Terms, and the applicable model EULA. Computer use poses unique risks that are distinct from standard API features or chat interfaces. These risks are heightened when using the computer use feature to interact with the internet. To minimize risks, consider taking precautions such as:
Any content that you enable Anthropic’s Claude to see or access can potentially override instructions or cause the model to make mistakes or perform unintended actions. Taking proper precautions, such as isolating Anthropic’s Claude from sensitive surfaces, is essential – including to avoid risks related to prompt injection. Before enabling or requesting permissions necessary to enable computer use features in your own products, inform end users of any relevant risks, and obtain their consent as appropriate.
When you are done using this solution, make sure to clean up all the resources. Follow the instructions in the provided GitHub repository.
Organizations across industries face significant challenges with cross-application workflows that traditionally require manual data entry or complex custom integrations. The integration of Anthropic’s computer use capability with Amazon Bedrock Agents represents a transformative approach to these challenges.
By using Amazon Bedrock Agents as the orchestration layer, organizations can alleviate the need for custom API development for each application, benefit from comprehensive logging and tracing capabilities essential for enterprise deployment, and implement automation solutions quickly.
As you begin exploring computer use with Amazon Bedrock Agents, consider workflows in your organization that could benefit from this approach. From invoice processing to customer onboarding, HR documentation to compliance reporting, the potential applications are vast and transformative.
We’re excited to see how you will use Amazon Bedrock Agents with the computer use capability to securely streamline operations and reimagine business processes through AI-driven automation.
Resources
To learn more, refer to the following resources:
https://preview.redd.it/j6qshjdiao7f1.jpg?width=1182&format=pjpg&auto=webp&s=9f5da751e086c7c3a8cd882f5b7648211daae50c https://reddit.com/link/1leexi9/video/bs096nikao7f1/player Link to the post: https://x.com/viccpoes/status/1934983545233277428 submitted by /u/LatentSpacer [link] [comments]
Editor’s Note: This post provides a detailed rebuttal of the multitude of misguided assertions presented…
Meetings play a crucial role in decision-making, project coordination, and collaboration, and remote meetings are…
The momentum of the Gemini 2.5 era continues to build. Following our recent announcements, we're…
By offering transparent tooling and clear implementation examples, OpenAI is pushing agentic systems out of…