ML 20156 image 1 scaled 1
As companies of various sizes adopt graphic processing units (GPU)-based machine learning (ML) training, fine-tuning and inference workloads, the demand for GPU capacity has outpaced industry-wide supply. This imbalance has made GPUs a scarce resource, creating a challenge for customers who need reliable access to GPU compute resources for their ML workloads.
When you encounter GPU capacity limitations, you might consider creating on-demand capacity reservations (ODCRs). ODCRs apply to planned, steady-state workloads with well-understood usage patterns. Short-term ODCR availability for GPU instances, particularly P-type instances, is often limited. Additionally, without a long-term contract, ODCRs are billed at on-demand rates, offering no cost advantage. This makes ODCRs unsuitable for short or exploratory workloads such as testing, evaluations, or events. A guided approach to secure short-term GPU capacity becomes necessary.
In this post, you will learn how to secure reserved GPU capacity for short-term workloads using Amazon Elastic Compute Cloud (Amazon EC2) Capacity Blocks for ML and Amazon SageMaker training plans. These solutions can address GPU availability challenges when you need short-term capacity for load testing, model validation, time-bound workshops, or preparing inference capacity ahead of a release.
There are several ways to access GPU capacity on AWS for short-term workloads:
On-demand instances are usually the first option for short-term GPU usage. If capacity is available at launch time, you can start using GPU instances immediately without prior commitment. This works well for ad hoc experiments, short tests and development tasks.
On-demand GPU capacity depends on regional supply and current demand, and availability can change quickly. If you stop or scale down an instance, you might not be able to reacquire the same capacity when needed again. This uncertainty often leads to keeping GPU instances running longer than needed, which can increase cost. Choose on-demand instances when your workload can tolerate potential launch delays or when timing is flexible.
Spot instances can reduce your GPU compute costs by up to 90%, but they trade cost saving for availability certainty. Spot capacity depends on unused capacity in the AWS Region. Instances can be interrupted when Amazon EC2 needs the capacity back, thus spot instances are suitable only for workloads that can handle interruption.
For ML workloads, spot instances work well when you can checkpoint progress and restart. Recommended use cases include distributed training jobs with periodic checkpoints, batch inference workloads that can be retried, and workshop environments that are designed to tolerate partial capacity.
Amazon EC2 Capacity Blocks for ML reserves GPU capacity for a specific time window so that the requested instances will be available when you launch them during the reserved period. Unlike ODCRs, Capacity Blocks are fully self-service and offer better short-term availability for GPU instances with a 40-50% discounted rate. Each Capacity Block represents a reservation of a specific number of a selected instance type for a defined duration. You can:
Capacity Blocks apply to workloads that run directly on Amazon EC2, where you manage the operating system, networking, and orchestration layers yourself.
Service level agreement (SLA) and hardware failures: If hardware fails during your reservation, you can terminate the affected instance and manually launch a replacement into the same Capacity Blocks reservation. The system returns the reserved capacity slot to your reservation after approximately 10 minutes of cleanup. Amazon EC2 maintains a buffer within each Capacity Block to support relaunching instances in case of hardware degradation, at no additional cost.
Note: Capacity Blocks have the following limitations:
Amazon SageMaker training plans provide access to reserve GPU capacity for ML workloads in the Amazon SageMaker AI managed environment, such as training jobs, Amazon SageMaker HyperPod clusters and inference. SageMaker training plans aren’t interchangeable with EC2 Capacity Blocks. With SageMaker training plans, you can:
Note that G-type instances (except G6 instances) aren’t currently supported by SageMaker training plans. If you need G6 instances, contact your AWS account team. For detailed information about the supported instance types in a given AWS Region, duration, and quantity options, see Supported instance types, AWS Regions, and pricing.
Amazon SageMaker training plans apply to:
Choose this option when you want Amazon SageMaker AI to manage instance provisioning, scaling, and lifecycle while still securing reserved capacity during a defined window.
When planning your short-term GPU strategy, you should evaluate options based on three key factors:
Start with the least restrictive option and move toward reserved capacity when availability or timing becomes critical.
Decision tree to choose the right option for securing GPU capacity.
Step 1: Determine your infrastructure management model
Step 2: Try on-demand capacity first
For both Amazon EC2 and Amazon SageMaker AI workloads, start with on-demand capacity. This approach:
If an initial launch fails, try these flexibility options:
Step 3: Use reserved capacity when certainty is required
If your workload must start at a specific time or your delivery timeline depends on reserved GPU access, reserving capacity becomes the appropriate choice:
This section shows you how to reserve and use GPU capacity for inference workloads managed by Amazon SageMaker training plans. Note that SageMaker training plans reservations are specific to the selected target resource. A plan purchased for inference can’t be used for Training Jobs or HyperPod clusters, or the reverse.
For other scenarios:
Before you begin, confirm that you have:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"sagemaker:CreateEndpointConfig",
"sagemaker:CreateEndpoint",
"sagemaker:DescribeEndpoint",
"sagemaker:DeleteEndpoint",
"sagemaker:DeleteEndpointConfig"
],
"Resource": [
"arn:aws:sagemaker:*:*:endpoint/*",
"arn:aws:sagemaker:*:*:endpoint-config/*"
]
}
]
} To get started, go to the Amazon SageMaker AI console, choose Training plans in the left navigation pane, and choose Create training plan.
The Training plans page in the Amazon SageMaker AI console.
For example, choose your preferred training date and duration (1 day), instance type and count (1 ml.trn1.32xlarge) for Inference Endpoint, and choose Find training plan.
Configure your training plan by selecting the instance type, instance count, date and duration for your inference workload.
The console displays available plans with the total price.
Review the suggested plans with upfront pricing before accepting the reservation.
If you accept this training plan, add your training details in the next step and choose Create your plan.
Note: SageMaker training plans can’t be canceled after purchase. The reservation will expire automatically at the end of the reserved period.
Review your training plan status in the console.
After creating your training plan, you can see the list of training plans. The plan initially enters a Pending state, awaiting payment. You pay the full price of a training plan up front. After AWS completes payment processing, the plan will transition to the Scheduled state. On the plan’s start date, it becomes Active, and the system allocates resources for your use.
Use the following command to check the training plan status:
aws sagemaker describe-training-plan
--training-plan-name your-training-plan-name
--region your-region When the response shows "Status": "Active", you can start running your inference tasks. Verify that the TargetResources field shows endpoint to confirm the plan is configured for inference workloads.
Use the following command to generate an endpoint configuration that uses the training plan resources:
aws sagemaker create-endpoint-config
--endpoint-config-name your-endpoint-config-name
--production-variants '[
{
"VariantName": "your-variant-name",
"ModelName": "your-model-name",
"InitialInstanceCount": 1,
"InstanceType": "ml.trn1.32xlarge",
"CapacityReservationConfig": {
"MlReservationArn": "your-training-plan-arn",
"CapacityReservationPreference": "capacity-reservations-only"
}
}
]' Create your endpoint resource by specifying the endpoint configuration from the previous step:
aws sagemaker create-endpoint
--endpoint-name your-endpoint-name
--endpoint-config-name your-endpoint-config-name Check your endpoint status and training plan capacity reservation status:
aws sagemaker describe-endpoint
--endpoint-name your-endpoint-name
--region your-region To avoid incurring ongoing charges, delete the resources that you created:
Delete the endpoint:
aws sagemaker delete-endpoint --endpoint-name your-endpoint-name Delete the endpoint configuration:
aws sagemaker delete-endpoint-config --endpoint-config-name your-endpoint-config-name Securing GPU capacity for transient workloads requires a different approach than planning long-term, steady-state usage. In this post, you learned how to approach short-term GPU capacity planning by:
You also learned how to use SageMaker training plans to reserve GPU capacity ahead of time. This capability helps reduce operational friction when preparing inference capacity for planned evaluations, releases, or expected traffic increases.
To learn more, refer to the following resources:
Continuing the music video u/optimisoprimeo posted: https://www.reddit.com/r/StableDiffusion/comments/1t64gni/so_far_this_is_my_favorite_usecase_for_ltx/ submitted by /u/hidden2u [link] [comments]
One of the major differentiators unlocked by learned codecs relative to their hard-coded traditional counterparts…
Today, we’re thrilled to announce that Gemini 3.1 Flash-Lite, our fastest and most cost-efficient Gemini…
Leaders at the tech giant were skeptical of OpenAI—but wary of pushing it into the…
As traditional computer chips reach their physical limits and artificial intelligence demands more energy than…