image1 ZUZN9j0max 1000x1000 1
Innovating with AI requires accelerators such as GPUs that can be hard to come by in times of extreme demand. To address this challenge, we offer Dynamic Workload Scheduler (DWS), a service that optimizes access to compute resources when and where you need them. In July, we announced Calendar mode in DWS to provide short-term ML capacity without long-term commitments, and today, we are taking the next step: the general availability (GA) of Flex-start VMs.
Available through the Compute Engine instance API, gcloud CLI, and the Google Cloud console, Flex-start VMs provide a simple and direct way to create single VM instances that can wait for in-demand GPUs. This makes it easy to integrate this flexible consumption option into your existing workflows and schedulers.
Flex-start VMs, powered by Dynamic Workload Scheduler, introduce a highly differentiated consumption model that’s a first among major cloud providers, letting you create single VM instances that provide fair and improved access to GPUs. Flex-start VMs are ideal for defined-duration tasks such as AI model fine-tuning, batch inference, HPC, and research experiments that don’t need to start immediately. In exchange for being flexible with start time, you get two major benefits:
Flex-start VMs can run uninterrupted for a maximum of seven days and consume preemptible quota.
request-valid-for-duration
.request-valid-for-duration
flag. Select a period between 90 seconds and 2 hours to instruct Compute Engine to hold your request in a queue. Your VM enters a PENDING state, and the system works to provision your resources as they become available within your specified timeframe. This “get-in-line” approach provides a fair and managed way to access hardware, transforming the user experience from one of repeated manual retries to a simple, one-time request.instanceTerminationAction = STOP
so that when your VM’s seven-day runtime expires, the instance is stopped rather than deleted. This preserves your VM’s configuration, including its IP address and boot disk, saving on setup time for subsequent runs.
Getting started with a queued Flex-start VM is straightforward. You can create one using a gcloud command or directly through the API.
gcloud example (to wait in queue):
API Request Snippet (JSON):
Flex-start VMs in the Instance API is a direct response to the need for more efficient, reliable, and fair access to high-demand AI accelerators. By introducing a novel queuing mechanism,you can integrate the new Flex-start consumption model into your existing workflows easily, so you can spend less time architecting retry loops for on-demand access. To learn more and try Flex-start VMs today, see the documentation and pricing information.
submitted by /u/mtrx3 [link] [comments]
Imbalanced datasets are a common challenge in machine learning.
Organizations are increasingly integrating generative AI capabilities into their applications to enhance customer experiences, streamline…
Many data science teams rely on Apache Spark running on Dataproc managed clusters for powerful,…
The upgraded version of the Legion Go S with SteamOS makes for a nice Steam…
Artificial intelligence is transforming biology and medicine by accelerating the discovery of new drugs and…