Introducing G2 VMs with NVIDIA L4 GPUs — a cloud-industry first

Organizations across industries are looking to AI to turn troves of data into intelligence, powered by the latest advances in generative AI. Yet for many organizations, there is a barrier to adopting the latest models because they can be costly to train or serve. A new class of cloud GPUs is needed to lower the cost of entry for businesses that want to tap the power of AI.

Today, we’re introducing G2, the newest addition to the Compute Engine GPU family in Google Cloud. G2 is the industry’s first cloud VM powered by the newly announced NVIDIA L4 Tensor Core GPU, and is purpose-built for large inference AI workloads like generative AI. G2 delivers cutting-edge performance-per-dollar for AI inference workloads that run on GPUs in the cloud. By switching from NVIDIA A10G GPUs to G2 instances with L4 GPUs, organizations can lower their production infrastructure costs up to 40%. We also found that customers switching from NVIDIA T4 GPUs to L4 GPUs can achieve 2x-4x better performance. As a universal GPU offering, G2 instances also help accelerate other workloads, offering significant performance improvements on HPC, graphics, and video transcoding. Currently in private preview, G2 VMs are both powerful and flexible, and scale easily from one up to eight GPUs.

Currently organizations require end-to-end enterprise ready infrastructure that will future proof their AI and HPC initiatives for a new era. G2s will be ready to be deployed on Vertex AI, GKE, and GCE, giving customers the freedom to architect their own custom software stack to meet their performance requirements and budget. With optimized Vertex AI support for G2 VMs, AI users can tap the latest generative AI models and technologies. With an easy to use UI and automated workflows, customers can access, tune and serve modern models for video, text, images, and audio without the toil of manual optimizations. The combination of these services with the power of G2 will help customers harness the power of complex machine models for their business.

NVIDIA L4 GPUs with Ada Lovelace Architecture

G2 machine families enable machine learning customers to run their production infrastructure in the cloud for a variety of applications such as language models, image classification, object detection, automated speech recognition, and language translation. Built on the Ada Lovelace architecture with fourth-generation Tensor Cores, the NVIDIA L4 GPU provides up to 30 TFLOPS of performance for FP32, and 242 TFLOPs for FP16. Newly added FP8 support, on top of existing INT8, BFLOAT16 and TF32 capabilities, makes the L4 ideal for ML inference.

With the latest third-generation RT Cores and DLSS 3.0 technology, G2 instances are also great for graphics-intensive workloads such as rendering and remote workstations when paired with NVIDIA RTX Virtual Workstation. NVIDIA L4 provides 3x video encoding and decoding performance, and adds new AV1 hardware-encoding capabilities. For example, G2 can enable gaming customers running game engines such as Unreal and Unity with modern graphics cards to run real-time applications. Likewise, media and entertainment customers that need GPU-enabled virtual workstations can use the L4 to create photo-realistic, high-resolution 3D content for movies, games, and AR/VR experiences using applications such as Autodesk Maya or 3D Studio Max.

What customers are saying

A handful of early customers have been testing G2 and have seen great results in real-world applications. Here are what some of them have to say about the benefits that G2 with NVIDIA L4 GPUs bring:

AppLovin

AppLovin enables developers and marketers to grow with market leading technologies. Businesses rely on AppLovin to solve their mission-critical functions with a powerful, full stack solution including user acquisition, retention, monetization and measurement.

“AppLovin serves billions of AI powered recommendations per day, so scalability and value are essential to our business,” said Omer Hasan, Vice President, Operations at AppLovin. “With Google Cloud’s G2 we’re seeing that NVIDIA L4 GPUs offer a significant increase in the scalability of our business, giving us the power to grow faster than ever before.”

Wombo AI

WOMBO aims to unleash everyone’s creativity through the magic of AI, transforming the way content is created, consumed, and distributed.

“WOMBO relies upon the latest AI technology for people to create immersive digital artwork from users’ prompts, letting them create high-quality, realistic art in any style with just an idea,” said Ben-Zion Benkhin, Co-Founder and CEO of WOMBO. “Google Cloud’s G2 instances powered by NVIDIA’s L4 GPUs will enable us to offer a better, more efficient image-generation experience for users seeking to create and share unique artwork.”

Descript

Descript’s AI-powered features and intuitive interface fuel YouTube and TikTok channels, top podcasts, and businesses using video for marketing, sales, and internal training and collaboration. Descript aims to make video a staple of every communicator’s toolkit, alongside docs and slides.

“G2 with L4’s AI Video capabilities allow us to deploy new features augmented by natural-language processing and generative AI to create studio-quality media with excellent performance and energy efficiency” said Kundan Kumar, Head of Artificial Intelligence at Descript.

Workspot

Workspot believes that the software-as-a-service (SaaS) model is the most secure, accessible and cost-effective way to deliver an enterprise desktop and should be central to accelerating the digital transformation of the modern enterprise.

“The Workspot team looks forward to continuing to evolve our partnership with Google Cloud and NVIDIA. Our customers have been seeing incredible performance leveraging NVIDIA’s T4 GPUs. The new G2 instances with L4 GPUS through Workspot’s remote Cloud PC workstations provide 2x and higher frame rates at 1280×711 and higher resolutions” said Jimmy Chang, Chief Product Officer at Workspot.

Pricing and availability

G2 instances are currently in private preview in the following regions: us-central1, asia-southeast1 and europe-west4. Submit your request here to join the private preview, or to receive a notification as when the public preview begins. Support will be coming to Google Kubernetes Engine (GKE), Vertex AI, and other Google Cloud services as well. We’ll share G2 public availability and pricing information later in the year.