Categories: FAANG

GPUs when you need them: Introducing Flex-start VMs

image1 ZUZN9j0max 1000x1000 1

Innovating with AI requires accelerators such as GPUs that can be hard to come by in times of extreme demand. To address this challenge, we offer Dynamic Workload Scheduler (DWS), a service that optimizes access to compute resources when and where you need them. In July, we announced Calendar mode in DWS to provide short-term ML capacity without long-term commitments, and today, we are taking the next step: the general availability (GA) of Flex-start VMs.

Available through the Compute Engine instance API, gcloud CLI, and the Google Cloud console, Flex-start VMs provide a simple and direct way to create single VM instances that can wait for in-demand GPUs. This makes it easy to integrate this flexible consumption option into your existing workflows and schedulers.

What are Flex-start VMs?

Flex-start VMs, powered by Dynamic Workload Scheduler, introduce a highly differentiated consumption model that’s a first among major cloud providers, letting you create single VM instances that provide fair and improved access to GPUs. Flex-start VMs are ideal for defined-duration tasks such as AI model fine-tuning, batch inference, HPC, and research experiments that don’t need to start immediately. In exchange for being flexible with start time, you get two major benefits:

Dramatically improved resource obtainability: By allowing your capacity requests to persist in a queue for up to two hours, you increase the likelihood of securing resources, without needing to build your own retry logic.
Cost-effective pricing: Flex-start VM SKUs offer significant discounts compared to standard on-demand pricing, making cutting-edge accelerators more accessible.

Flex-start VMs can run uninterrupted for a maximum of seven days and consume preemptible quota.

A new way to request capacity

With Flex-start VMs, you can now choose how your request is handled if capacity isn’t immediately available using a single parameter: request-valid-for-duration.

Without this parameter, when creating a VM, Compute Engine makes a short, best-effort attempt (about 90 seconds) to secure your resources. If capacity is available, your VM is provisioned. If not, the request fails quickly with a stockout error. This “fail-fast” behavior is good for workflows where you need an answer immediately so you can make scheduling decisions such as trying another zone or falling back to a different machine type.

However, for workloads that can wait, you can now make a persistent capacity request by setting the request-valid-for-duration flag. Select a period between 90 seconds and 2 hours to instruct Compute Engine to hold your request in a queue. Your VM enters a PENDING state, and the system works to provision your resources as they become available within your specified timeframe. This “get-in-line” approach provides a fair and managed way to access hardware, transforming the user experience from one of repeated manual retries to a simple, one-time request.

Key features of Flex-start VMs

Flex-start VMs offer several critical features for flexibility and ease of use:

Direct instance API access: Integration with instances.insert, or via a single CLI command, lets you create single Flex-start VMs simply and directly, making it easy to integrate them into custom schedulers and workflows.
Stop and start capabilities: You have full control over your Flex-start VMs. For instance, you can stop an instance to pause billing and release the underlying resources. Then, when you’re ready to resume it, simply issue a start command to place a new capacity request. Once the capacity is successfully provisioned, the seven-day maximum run duration clock resets.
Configurable termination action: For many advanced use cases, you can set instanceTerminationAction = STOP so that when your VM’s seven-day runtime expires, the instance is stopped rather than deleted. This preserves your VM’s configuration, including its IP address and boot disk, saving on setup time for subsequent runs.

What customers have to say

Customers across research and industry are using Flex-start VMs to improve their access to scarce accelerators.

“Our custom scheduling environment demands precise control and direct API access. The GA of Flex-start in the Instance API, particularly with its stop/start capabilities and configurable termination, is a game-changer. It allows us to seamlessly integrate this new, highly-efficient consumption model into our complex workflows, maximizing both our resource utilization and performance.” – Ragnar Kjørstad, Systems Engineer, Hudson River Trading (HRT)

“For our critical anti-fraud model training, Flex-start VMs are a game-changer. The queuing mechanism gives us reliable access to powerful A100 GPUs, which enhances our development cycles and security offerings at a significant performance-to-cost advantage.” – Bakai Zhamgyrchiev, Head of ML, Oz Forensics

Get started today

Getting started with a queued Flex-start VM is straightforward. You can create one using a gcloud command or directly through the API.

gcloud example (to wait in queue):

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud beta compute instances create my-flex-start-vm \rn –machine-type=a3-megagpu-8g \rn –provisioning-model=FLEX_START \rn –max-run-duration=3d \rn –request-valid-for-duration=2h \rn –zone=us-central1-a’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f3eb255fe20>)])]>

API Request Snippet (JSON):

code_block: <ListValue: [StructValue([(‘code’, ‘{rn “name”: “my-flex-start-vm”,rn “machineType”: “zones/us-central1-a/machineTypes/a3-megagpu-8g”,rn “scheduling”: {rn “provisioningModel”: “FLEX_START”,rn “maxRunDuration”: {rn “seconds”: “259200”rn }rn },rn “params”: {rn “request_valid_for_duration”: {rn “seconds”: “7200”rn }rn },rn …rn}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f3eb55b7370>)])]>

Flex-start VMs in the Instance API is a direct response to the need for more efficient, reliable, and fair access to high-demand AI accelerators. By introducing a novel queuing mechanism,you can integrate the new Flex-start consumption model into your existing workflows easily, so you can spend less time architecting retry loops for on-demand access. To learn more and try Flex-start VMs today, see the documentation and pricing information.

Dynamic Workload Scheduler: Optimizing resource access and economics for AI/ML workloads

We are in the midst of an exciting era of AI-driven innovation and transformation. Today we announced AI Hypercomputer, a groundbreaking architecture that employs an integrated system of AI-optimized hardware, software, and consumption models. With AI Hypercomputer, enterprises everywhere can run on the same cutting-edge infrastructure that is already the…

December 7, 2023

In "FAANG"