Understanding Calendar mode for Dynamic Workload Scheduler: Reserve ML GPUs and TPUs

Organizations need ML compute resources that can accommodate bursty peaks and periodic troughs. That means the consumption models for AI infrastructure need to evolve to be more cost-efficient, provide term flexibility, and support rapid development on the latest GPU and TPU accelerators.

Calendar mode is currently available in preview as the newest feature of Dynamic Workload Scheduler. This mode provides short-term ML capacity — up to 90 days of reserved capacity — without requiring long-term commitments. 

Calendar mode extends the capabilities of Compute Engine future reservations to provide co-located GPU and TPU capacity that’s a good fit for model training, fine-tuning, experimentation and inference workloads. 

Similar to a flight or hotel booking experience, Calendar mode makes it easy to search for and reserve ML capacity. Simply define your resource type, number of instances, expected start date and duration, and in a few seconds, you’ll be able to see the available capacity and reserve it. Once the capacity reservation is confirmed and delivered to your project, you can consume it via Compute Engine, Google Kubernetes Engine (GKE), Vertex AI custom training, and Google Batch.

What customers are saying

Over the past year, early access customers have used Calendar mode to reserve ML compute resources for a variety of use cases, from drug discovery to training new models.

1-Schrodinger

“To accelerate drug discovery, Schrödinger relies on large-scale simulations to identify promising, high-quality molecules. Reserving GPUs through Google Cloud’s DWS Calendar Mode provides us the crucial flexibility and assurance needed to cost-effectively scale our compute environment for critical, time-sensitive projects.” – Shane Brauner, EVP/CIO, Schrödinger

2-Vilya

“For Vilya, Dynamic Workload Scheduler has delivered on two key fronts: affordability and performance. The cost efficiency received was a significant benefit, and the reliable access to GPUs has empowered our teams to complete projects much faster, and it’s been invaluable for our computationally intensive tasks. It’s allowed us to be more efficient and productive without breaking the budget.” – Patrick Salveson, co founder and CTO

3-Databricks

“Databricks simplifies the deployment and management of machine learning models, enabling fine tuning and real-time inference for scalable production environments. DWS Calendar Mode alleviated the burden of GPU capacity planning and provided seamless access to the latest generation GPU hardware for dynamic demand for testing and ongoing training.” – Ravi Gadde, Sr. Director, Serverless Platform

Using Calendar mode

With these concepts and use cases under our belts, let’s take a look at how to find and reserve capacity via the Google Cloud console. Navigate to Cloud console -> Compute Engine -> Reservation. Then, on the Future Reservation tab, click Create a Future Reservation. Selecting a supported GPU or TPU will expose the Search for capacity section as shown below.

4-CreationUI

Proceed to the Advanced Settings to determine if the reservation should be shared across multiple projects. The final step is to name the reservations upon creation.

5-SubmissionUI

6-ReservationListUI

The reservation is approved within minutes and can be consumed once it is in the Fulfilled status at the specified start time.

Get started today

Calendar mode with AI Hypercomputer makes finding, reserving, consuming, and managing capacity easy for ML workloads. Get started today with Calendar mode for TPUs. Contact your account team for GPU access in Compute Engine, GKE, or Slurm. To learn more see Calendar mode documentation and Dynamic Workload Scheduler pricing.