Categories: FAANG

Unlock global AI inference scalability using new global cross-Region inference on Amazon Bedrock with Anthropic’s Claude Sonnet 4.5

image 2

Organizations are increasingly integrating generative AI capabilities into their applications to enhance customer experiences, streamline operations, and drive innovation. As generative AI workloads continue to grow in scale and importance, organizations face new challenges in maintaining consistent performance, reliability, and availability of their AI-powered applications. Customers are looking to scale their AI inference workloads across multiple AWS Regions to support consistent performance and reliability.

To address this need, we introduced cross-Region inference (CRIS) for Amazon Bedrock. This managed capability automatically routes inference requests across multiple Regions, enabling applications to handle traffic bursts seamlessly and achieve higher throughput without requiring developers to predict demand fluctuations or implement complex load-balancing mechanisms. CRIS works through inference profiles, which define a foundation model (FM) and the Regions to which requests can be routed.

We are excited to announce availability of global cross-Region inference with Anthropic’s Claude Sonnet 4.5 on Amazon Bedrock. Now, with cross-Region inference, you can choose either a geography-specific inference profile or a global inference profile. This evolution from geography-specific routing provides greater flexibility for organizations because Amazon Bedrock automatically selects the optimal commercial Region within that geography to process your inference request. Global CRIS further enhances cross-Region inference by enabling the routing of inference requests to supported commercial Regions worldwide, optimizing available resources and enabling higher model throughput. This helps support consistent performance and higher throughput, particularly during unplanned peak usage times. Additionally, global CRIS supports key Amazon Bedrock features, including prompt caching, batch inference, Amazon Bedrock Guardrails, Amazon Bedrock Knowledge Bases, and more.

In this post, we explore how global cross-Region inference works, the benefits it offers compared to Regional profiles, and how you can implement it in your own applications with Anthropic’s Claude Sonnet 4.5 to improve your AI applications’ performance and reliability.

Core functionality of global cross-Region inference

Global cross-Region inference helps organizations manage unplanned traffic bursts by using compute resources across different Regions. This section explores how this feature works and the technical mechanisms that power its functionality.

Understanding inference profiles

An inference profile in Amazon Bedrock defines an FM and one or more Regions to which it can route model invocation requests. The global cross-Region inference profile for Anthropic’s Claude Sonnet 4.5 extends this concept beyond geographic boundaries, allowing requests to be routed to one of the supported Amazon Bedrock commercial Regions globally, so you can prepare for unplanned traffic bursts by distributing traffic across multiple Regions.

Inference profiles operate on two key concepts:

Source Region – The Region from which the API request is made
Destination Region – A Region to which Amazon Bedrock can route the request for inference

At the time of writing, global CRIS supports over 20 source Regions, and the destination Region is a supported commercial Region dynamically chosen by Amazon Bedrock.

Intelligent request routing

Global cross-Region inference uses an intelligent request routing mechanism that considers multiple factors, including model availability, capacity, and latency, to route requests to the optimal Region. The system automatically selects the optimal available Region for your request without requiring manual configuration:

Regional capacity – The system considers the current load and available capacity in each potential destination Region.
Latency considerations – Although the system prioritizes availability, it also takes latency into account. By default, the service attempts to fulfill requests from the source Region when possible, but it can seamlessly route requests to other Regions as needed.
Availability metrics – The system continuously monitors the availability of FMs across Regions to support optimal routing decisions.

This intelligent routing system enables Amazon Bedrock to distribute traffic dynamically across the AWS global infrastructure, facilitating optimal availability for each request and smoother performance during high-usage periods.

Monitoring and logging

When using global cross-Region inference, Amazon CloudWatch and AWS CloudTrail continue to record log entries only in the source Region where the request originated. This simplifies monitoring and logging by maintaining all records in a single Region regardless of where the inference request is ultimately processed. To track which Region processed a request, CloudTrail events include an additionalEventData field with an inferenceRegion key that specifies the destination Region. Organizations can monitor and analyze the distribution of their inference requests across the AWS global infrastructure.

Data security and compliance

Global cross-Region inference maintains high standards for data security. Data transmitted during cross-Region inference is encrypted and remains within the secure AWS network. Sensitive information remains protected throughout the inference process, regardless of which Region processes the request. Because security and compliance is a shared responsibility, you must also consider legal or compliance requirements that come with processing inference request in a different geographic location. Because global cross-Region inference allows requests to be routed globally, organizations with specific data residency or compliance requirements can elect, based on their compliance needs, to use geography-specific inference profiles to make sure data remains within certain Regions. This flexibility helps businesses balance redundancy and compliance needs based on their specific requirements.

Implement global cross-Region inference

To use global cross-Region inference with Anthropic’s Claude Sonnet 4.5, developers must complete the following key steps:

Use the global inference profile ID – When making API calls to Amazon Bedrock, specify the global Anthropic’s Claude Sonnet 4.5 inference profile ID (global.anthropic.claude-sonnet-4-5-20250929-v1:0) instead of a Region-specific model ID. This works with both InvokeModel and Converse APIs.
Configure IAM permissions – Grant appropriate AWS Identity and Access Management (IAM) permissions to access the inference profile and FMs in potential destination Regions. In the next section, we provide more details. You can also read more about prerequisites for inference profiles.

Implementing global cross-Region inference with Anthropic’s Claude Sonnet 4.5 is straightforward, requiring only a few changes to your existing application code. The following is an example of how to update your code in Python:

import boto3
import json
bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')


model_id = "global.anthropic.claude-sonnet-4-5-20250929-v1:0"  



response = bedrock.converse(
    messages=[{"role": "user", "content": [{"text": "Explain cloud computing in 2 sentences."}]}],
    modelId=model_id,
)

print("Response:", response['output']['message']['content'][0]['text'])
print("Tokens used:", result.get('usage', {}))

If you’re using the Amazon Bedrock InvokeModel API, you can quickly switch to a different model by changing the model ID, as shown in Invoke model code examples.

IAM policy requirements for global CRIS

In this section, we discuss the IAM policy requirements for global CRIS.

Enable global CRIS

To enable global CRIS for your users, you must apply a three-part IAM policy to the role. The following is an example IAM policy to provide granular control. You can replace <REQUESTING REGION> in the example policy with the Region you are operating in.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "GrantGlobalCrisInferenceProfileRegionAccess",
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:<REQUESTING REGION>:<ACCOUNT>:inference-profile/global.<MODEL NAME>"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:RequestedRegion": "<REQUESTING REGION>"
                }
            }
        },
        {
            "Sid": "GrantGlobalCrisInferenceProfileInRegionModelAccess",
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:<REQUESTING REGION>::foundation-model/<MODEL NAME>"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:RequestedRegion": "<REQUESTING REGION>",
                    "bedrock:InferenceProfileArn": "arn:aws:bedrock:<REQUESTING REGION>:<ACCOUNT>:inference-profile/global.<MODEL NAME>"
                }
            }
        },
        {
            "Sid": "GrantGlobalCrisInferenceProfileGlobalModelAccess",
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:::foundation-model/<MODEL NAME>"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:RequestedRegion": "unspecified",
                    "bedrock:InferenceProfileArn": "arn:aws:bedrock:<REQUESTING REGION>:<ACCOUNT>:inference-profile/global.<MODEL NAME>"
                }
            }
        }
    ]
}

The first part of the policy grants access to the Regional inference profile in your requesting Region. This policy allows users to invoke the specified global CRIS inference profile from their requesting Region. The second part of the policy provides access to the Regional FM resource, which is necessary for the service to understand which model is being requested within the Regional context. The third part of the policy grants access to the global FM resource, which enables the cross-Region routing capability that makes global CRIS function. When implementing these policies, make sure all three resource Amazon Resource Names (ARNs) are included in your IAM statements:

The Regional inference profile ARN follows the pattern arn:aws:bedrock:REGION:ACCOUNT:inference-profile/global.MODEL-NAME. This is used to give access to the global inference profile in the source Region.
The Regional FM uses arn:aws:bedrock:REGION::foundation-model/MODEL-NAME. This is used to give access to the FM in the source Region.
The global FM requires arn:aws:bedrock:::foundation-model/MODEL-NAME. This is used to give access to the FM in different global Regions.

The global FM ARN has no Region or account specified, which is intentional and required for the cross-Region functionality.

To simplify onboarding, global CRIS doesn’t require complex changes to an organization’s existing Service Control Policies (SCPs) that might deny access to services in certain Regions. When you opt in to global CRIS using this three-part policy structure, Amazon Bedrock will process inference requests across commercial Regions without validating against Regions denied in other parts of SCPs. This prevents workload failures that could occur when global CRIS routes inference requests to new or previously unused Regions that might be blocked in your organization’s SCPs. However, if you have data residency requirements, you should carefully evaluate your use cases before implementing global CRIS, because requests might be processed in any supported commercial Region.

Disable global CRIS

You can choose from two primary approaches to implement deny policies to global CRIS for specific IAM roles, each with different use cases and implications:

Remove an IAM policy – The first method involves removing one or more of the three required IAM policies from user permissions. Because global CRIS requires all three policies to function, removing a policy will result in denied access.
Implement a deny policy – The second approach is to implement an explicit deny policy that specifically targets global CRIS inference profiles. This method provides clear documentation of your security intent and makes sure that even if someone accidentally adds the required allow policies later, the explicit deny will take precedence. The deny policy should use a StringEquals condition matching the pattern "aws:RequestedRegion": "unspecified". This pattern specifically targets inference profiles with the global prefix.

When implementing deny policies, it’s crucial to understand that global CRIS changes how the aws:RequestedRegion field behaves. Traditional Region-based deny policies that use StringEquals conditions with specific Region names such as "aws:RequestedRegion": "us-west-2" will not work as expected with global CRIS because the service sets this field to global rather than the actual destination Region. However, as mentioned earlier, "aws:RequestedRegion": "unspecified" will result in the deny effect.

Note: To simplify customer onboarding, global CRIS has been designed to work without requiring complex changes to an organization’s existing SCPs that may deny access to services in certain Regions. When customers opt in to global CRIS using the three-part policy structure described above, Amazon Bedrock will process inference requests across supported AWS commercial Regions without validating against regions denied in any other parts of SCPs. This prevents workload failures that could occur when global CRIS routes inference requests to new or previously unused Regions that might be blocked in your organization’s SCPs. However, customers with data residency requirements should evaluate their use cases before implementing global CRIS, because requests may be processed in any supported commercial Regions. As a best practice, organizations who use geographic CRIS but want to opt out from global CRIS should implement the second approach.

Request limit increases for global CRIS with Anthropic’s Claude Sonnet 4.5

When using global CRIS inference profiles, it’s important to understand that service quota management is centralized in the US East (N. Virginia) Region. However, you can use global CRIS from over 20 supported source Regions. Because this will be a global limit, requests to view, manage, or increase quotas for global cross-Region inference profiles must be made through the Service Quotas console or AWS Command Line Interface (AWS CLI) specifically in the US East (N. Virginia) Region. Quotas for global CRIS inference profiles will not appear on the Service Quotas console or AWS CLI for other source Regions, even when they support global CRIS usage. This centralized quota management approach makes it possible to access your limits globally without estimating usage in individual Regions. If you don’t have access to US East (N. Virginia), reach out to your account teams or AWS support.

Complete the following steps to request a limit increase:

Sign in to the Service Quotas console in your AWS account.
Make sure your selected Region is US East (N. Virginia).
In the navigation pane, choose AWS services.
From the list of services, find and choose Amazon Bedrock.
In the list of quotas for Amazon Bedrock, use the search filter to find the specific global CRIS quotas. For example:
- Global cross-Region model inference tokens per day for Anthropic Claude Sonnet 4.5 V1
- Global cross-Region model inference tokens per minute for Anthropic Claude Sonnet 4.5 V1
Select the quota you want to increase.
Choose Request increase at account level.
Enter your desired new quota value.
Choose Request to submit your request.

Use global cross-Region inference with Anthropic’s Claude Sonnet 4.5

Claude Sonnet 4.5 is Anthropic’s most intelligent model (at the time of writing), and is best for coding and complex agents. Anthropic’s Claude Sonnet 4.5 demonstrates advancements in agent capabilities, with enhanced performance in tool handling, memory management, and context processing. The model shows marked improvements in code generation and analysis, including identifying optimal improvements and exercising stronger judgment in refactoring decisions. It particularly excels at autonomous long-horizon coding tasks, where it can effectively plan and execute complex software projects spanning hours or days while maintaining consistent performance and reliability throughout the development cycle.

Global cross-Region inference for Anthropic’s Claude Sonnet 4.5 delivers multiple advantages over traditional geographic cross-Region inference profiles:

Enhanced throughput during peak demand – Global cross-Region inference provides improved resilience during periods of peak demand by automatically routing requests to Regions with available capacity. This dynamic routing happens seamlessly without additional configuration or intervention from developers. Unlike traditional approaches that might require complex client-side load balancing between Regions, global cross-Region inference handles traffic spikes automatically. This is particularly important for business-critical applications where downtime or degraded performance can have significant financial or reputational impacts.
Cost-efficiency – Global cross-Region inference for Anthropic’s Claude Sonnet 4.5 offers approximately 10% savings on both input and output token pricing compared to geographic cross-Region inference. The price is calculated based on the Region from which the request is made (source Region). This means organizations can benefit from improved resilience with even lower costs. This pricing model makes global cross-Region inference a cost-effective solution for organizations looking to optimize their generative AI deployments. By improving resource utilization and enabling higher throughput without additional costs, it helps organizations maximize the value of their investment in Amazon Bedrock.
Streamlined monitoring – When using global cross-Region inference, CloudWatch and CloudTrail continue to record log entries in your source Region, simplifying observability and management. Even though your requests are processed across different Regions worldwide, you maintain a centralized view of your application’s performance and usage patterns through your familiar AWS monitoring tools.
On-demand quota flexibility – With global cross-Region inference, your workloads are no longer limited by individual Regional capacity. Instead of being restricted to the capacity available in a specific Region, your requests can be dynamically routed across the AWS global infrastructure. This provides access to a much larger pool of resources, making it less complicated to handle high-volume workloads and sudden traffic spikes.

If you’re currently using Anthropic’s Sonnet models on Amazon Bedrock, upgrading to Claude Sonnet 4.5 is a great opportunity to enhance your AI capabilities. It offers a significant leap in intelligence and capability, offered as a straightforward, drop-in replacement at a comparable price point as Sonnet 4. The primary reason to switch is Sonnet 4.5’s superior performance across critical, high-value domains. It is Anthropic’s most powerful model so far for building complex agents, demonstrating state-of-the-art performance in coding, reasoning, and computer use. Furthermore, its advanced agentic capabilities, such as extended autonomous operation and more effective use of parallel tool calls, enable the creation of more sophisticated AI workflows.

Conclusion

Amazon Bedrock global cross-Region inference for Anthropic’s Claude Sonnet 4.5 marks a significant evolution in AWS generative AI capabilities, enabling global routing of inference requests across the AWS worldwide infrastructure. With straightforward implementation and comprehensive monitoring through CloudTrail and CloudWatch, organizations can quickly use this powerful capability for their AI applications, high-volume workloads, and disaster recovery scenarios.We encourage you to try global cross-Region inference with Anthropic’s Claude Sonnet 4.5 in your own applications and experience the benefits firsthand. Start by updating your code to use the global inference profile ID, configure appropriate IAM permissions, and monitor your application’s performance as it uses the AWS global infrastructure to deliver enhanced resilience.

For more information about global cross-Region inference for Anthropic’s Claude Sonnet 4.5 in Amazon Bedrock, refer to Increase throughput with cross-Region inference, Supported Regions and models for inference profiles, and Use an inference profile in model invocation.

About the authors

Melanie Li, PhD, is a Senior Generative AI Specialist Solutions Architect at AWS based in Sydney, Australia, where her focus is on working with customers to build solutions using state-of-the-art AI/ML tools. She has been actively involved in multiple generative AI initiatives across APJ, harnessing the power of LLMs. Prior to joining AWS, Dr. Li held data science roles in the financial and retail industries.

Saurabh Trikande is a Senior Product Manager for Amazon Bedrock and Amazon SageMaker Inference. He is passionate about working with customers and partners, motivated by the goal of democratizing AI. He focuses on core challenges related to deploying complex AI applications, inference with multi-tenant models, cost optimizations, and making the deployment of generative AI models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family.

Derrick Choo is a Senior Solutions Architect at AWS who accelerates enterprise digital transformation through cloud adoption, AI/ML, and generative AI solutions. He specializes in full-stack development and ML, designing end-to-end solutions spanning frontend interfaces, IoT applications, data integrations, and ML models, with a particular focus on computer vision and multi-modal systems.

Satveer Khurpa is a Sr. WW Specialist Solutions Architect, Amazon Bedrock at Amazon Web Services. In this role, he uses his expertise in cloud-based architectures to develop innovative generative AI solutions for clients across diverse industries. Satveer’s deep understanding of generative AI technologies allows him to design scalable, secure, and responsible applications that unlock new business opportunities and drive tangible value.

Jared Dean is a Principal AI/ML Solutions Architect at AWS. Jared works with customers across industries to develop machine learning applications that improve efficiency. He is interested in all things AI, technology, and BBQ.

Jan Catarata is a software engineer working on Amazon Bedrock, where he focuses on designing robust distributed systems. When he’s not building scalable AI solutions, you can find him strategizing his next move with friends and family at game night.

AI Generated Robotic Content