Artificial Intelligence (AI) and large language models (LLMs) are experiencing explosive growth, powering applications from machine translation to artistic creation. These technologies rely on intensive computations that require specialized hardware resources, like GPUs. But access to GPUs can be challenging, both in terms of availability and cost.
For Google Cloud users, the introduction of Dynamic Workload Scheduler (DWS) transformed how you can access and use GPU resources, particularly within a Google Kubernetes Engine (GKE) cluster. Dynamic Workload Scheduler optimizes AI/ML resource access and spending by simultaneously scheduling necessary accelerators like TPUs and GPUs across various Google Cloud services, improving the performance of training and fine-tuning jobs.
Further, Dynamic Workload Scheduler offers an easy and straightforward integration between GKE and Kueue, a cloud-native job scheduler, making it easier to access GPUs as quickly as possible, in a given region, for a given GKE cluster.
But what if you want to deploy your workload in any available region, as soon as possible, as soon as DWS provides you the resources your workload needs?
This is where MultiKueue, a Kueue feature, comes into play. With MultiKueue, GKE, and Dynamic Workload Scheduler, you can wait for accelerators in multiple regions. Dynamic Workload Scheduler automatically provisions resources in the best GKE clusters as soon as they are available. By submitting workloads to a global queue, MultiKueue executes them in the region with available GPU resources, helping to optimize global resource usage, lowering costs, and speeding up processing.
MultiKueue enables workload distribution across multiple GKE clusters in different regions. By identifying clusters with available resources, MultiKueue simplifies the process of dispatching jobs to the optimal location.
Dynamic Workload Scheduler on GKE Autopilot, our managed Kubernetes service that automatically handles the provisioning, scaling, security, and maintenance of your container infrastructure; it’s supported on GKE Autopilot 1.30.3. Let’s take a deeper look at how to set up and manage MultiKueue with Dynamic Workload Scheduler, so you can obtain GPU resources faster.
MultiKueue provides two distinct cluster roles:
Manager cluster – Establish and maintain the connection with the worker clusters, as well as create and monitor remote objects (workloads or jobs) while keeping the local ones in sync.
Worker cluster – A simple standalone Kueue cluster that lets you execute the job submitted by the manager cluster.
In this example we create four GKE Autopilot clusters:
One manager cluster in europe-west4
Three worker clusters in
europe-west4
us-east4
asia-southeast1
Let’s take a look at how this works in the following step-by-step example. You can access the files for this example in this github repository.
1. Clone github repository
2. Create GKE clusters
This terraform script creates the required GKE clusters and adds four entries to your kubeconfig files:
manager-europe-west4
worker-us-east4
worker-europe-west4
worker-asia-southeast1
Then you can switch between contexts easily with
3. Install and configure MultiKueue
This script:
Installs kueue in the four clusters
Enables and configures MultiKueue in the manager cluster
Creates a podMonitoring resource for each clusters that enables kueue metrics to be sent to Google Cloud Managed Service for Prometheus
Configures the connection between the manager cluster and the worker clusters
Configures Kueue in the worker clusters
GKE clusters, Kueue with MultiKueue, and DWS are now configured and ready to use. Once you submit your jobs, the Kueue manager distributes them across the three worker clusters.
In the dws-multi-worker.yaml file, you’ll find the Kueue configuration for the worker clusters, including the manager configuration.
The following script provides a basic example of how to set up the MultiKueue AdmissionCheck with three worker clusters.
4. Submit jobs
Ensure you’re using the manager kubecontext when submitting jobs.
To observe how the MultiKueue admission check distributes jobs among worker clusters, you can submit the job creation request multiple times.
5. Get jobs status
To check the job status and determine the scheduled region, execute the following command
6. Delete resources
Finally, be sure to delete the four GKE clusters you created to try out this functionality:
So that’s how you can leverage MultiKueue, GKE, and DWS to streamline global job execution, optimize speed, and eliminate the need for manual node management!
This setup also addresses the needs of those with data residency requirements, allowing you to dedicate subsets of clusters for different workloads and ensure compliance.
To further enhance your setup, you can leverage advanced kueue features like team management with local kueue or workload priority classes. Additionally, you can gain valuable insights by creating a Grafana or Cloud Monitoring dashboard that utilizes Kueue metrics, which are automatically handled by Google Managed Service for Prometheus via the PodMonitoring resources.
The End of the AI Safety DebateFor years, a passionate contingent of researchers, ethicists, and…
A new wave of AI-powered browser-use agents is emerging, promising to transform how enterprises interact…
Employees throughout the federal government have until 11:59pm ET Monday to detail five things they…
Researchers are blurring the lines between robotics and materials, with a proof-of-concept material-like collective of…
Be sure to check out the previous articles in this series: •
TL;DR We compared Grok 3 and o3-mini’s results on this topic. They both passed. Since…