Many of today’s multimodal workloads require a powerful mix of GPU-based accelerators, large GPU memory, and professional graphics to achieve the performance and throughput that they need. Today, we announced the general availability of the G4 VM, powered by NVIDIA’s RTX PRO 6000 Blackwell Server Edition GPUs. The addition of the G4 expands our comprehensive NVIDIA GPU portfolio, complementing the specialized scale of the A-series VMs, and the cost-efficiency of G2 VMs. The G4 VM is available now, bringing GPU availability to more Google Cloud regions than ever before, for applications that are latency sensitive or have specific regulatory requirements.
We also announced the general availability of NVIDIA Omniverse as a virtual machine image (VMI) on Google Cloud Marketplace. When run on G4, it’s easier than ever to develop and deploy industrial digital twin and physical AI simulation applications leveraging NVIDIA Omniverse libraries. G4 VMs provide the necessary infrastructure — up to 768 GB of GDDR7 memory, NVIDIA Tensor Cores, and fourth-generation Ray Tracing (RT) cores — to run the demanding real-time rendering and physically accurate simulations required for enterprise digital twins. Together, they provide a scalable cloud environment to build, deploy, and interact with applications for industrial digital twins or robotics simulation.
A universal GPU platform
The G4 VM offers a profound leap in performance, with up to 9x the throughput of G2 instances, enabling a step-change in results for a wide spectrum of workloads, from multi-modal AI inference, photorealistic design and visualization, and robotics simulation using applications developed on NVIDIA Omniverse. The G4 currently comes in 1, 2, 4, and 8 NVIDIA RTX PRO 6000 Blackwell GPU options, with fractional GPU options coming soon.
Here are some of the ways you can use G4 to innovate and accelerate your business:
AI training, fine-tuning, and inference
- Generative AI acceleration and efficiency: With its FP4 precision support, G4’s high-efficiency compute accelerates LLM fine-tuning and inference, letting you create real-time generative AI applications such as multimodal and text-to-image creation models.
- Resource optimization with Multi-Instance GPU (MIG) support: G4 allows a single GPU to be securely partitioned into up to four fully isolated MIG instances, each with its own high-bandwidth memory, compute cores, and dedicated media engines. This feature maximizes price-performance by enabling multiple smaller distinct workloads to run concurrently with guaranteed resources, isolation, and quality of service..
- Flexible model capacity and scaling: Serve a wide range of models, from less than 30B to over 100B parameters, by leveraging advanced quantization techniques, MIG partitioning, and multi-GPU configurations.
NVIDIA Omniverse and simulation
- NVIDIA Omniverse integration: Choose this foundation to build and connect simulation applications using physically-based simulation and OpenUSD that enable real-time interactivity and the development of AI-accelerated digital twins.
- Large-scale digital twin acceleration: Accelerate proprietary or commercial computer-aided engineering and simulation software to run scenarios with billions of cells in complex digital twin environments.
- Near-real-time physics analysis: Leverage the G4’s parallel compute power and memory to handle immense computational domains, enabling near-real-time computational fluid dynamics and complex physics analysis for high-fidelity simulations.
- Robotics development: With NVIDIA Isaac Sim, an open-source, reference robotic simulation framework, customers are now able to create, train, and simulate AI-driven robots in physical and virtual environments. Isaac Sim is now available on the Google Cloud Marketplace.
AI-driven rendering, graphics and virtual workstations
- AI-augmented content creation: Harness neural shaders and fifth-generation NVIDIA Tensor Cores to integrate AI directly into a programmable rendering pipeline, driving the next decade of AI-augmented graphics innovations, including real-time cinematic rendering and enhanced content creation.
- Massive scene handling: Leverage massive memory (up to 96 GB per GPU on the G4) to create and render large complex 3D models and photorealistic visualizations with stunning detail and accuracy.
- Virtual workstations: Fuel digital twins, simulation, and VFX workloads. The G4’s leap in capability is powered by full support for all NVIDIA DLSS 4 features, the latest NVENC/NVDEC encoders for video streaming and transcode, and fourth-generation RT Cores for real-time ray tracing.
Google Cloud scales NVIDIA RTX PRO 6000
Modern generative AI models often exceed the VRAM of a single GPU, making you use multi-GPU configurations to serve these workloads. While this approach is common, performance can be bottlenecked by the communication speed between the AI architecture. We significantly boosted multi-GPU performance on G4 VMs by implementing an enhanced PCIe-based P2P data path that optimizes critical collective operations like All-Reduce, which is essential for splitting models across GPUs. Thanks to the G4’s enhanced peer-to-peer capabilities, you can expect up to 168% throughput gains and 41% lower latency (inter-token latency) when using tensor parallelism for model serving compared to standard non-P2P offerings.
For your generative AI applications, this technical differentiation translates into:
Faster user experience: Lower latency means quicker responses from your AI services, enabling more interactive and real-time applications.
Higher scalability: Increased throughput allows you to serve more concurrent users from a single virtual machine, significantly improving the price-performance and scalability of your service.
Google Cloud services integrated with G4 VMs
G4 VMs are fully integrated with several Google Cloud services, accelerating your AI workloads from day one.
Google Kubernetes Engine (GKE): G4 GPUs are generally available through GKE. Since GKE recently extended Autopilot to all qualifying clusters, including GKE Standard clusters, you can benefit from GKE’s container-optimized compute platform to rapidly scale your G4 GPUs, enabling you to optimize costs. By adding the GKE Inference Gateway, you can stretch the benefits of G4 even further to achieve lower AI serving latency and higher throughput.
Vertex AI: Both inference and training benefit significantly from G4’s large GPU memory (96 GB per GPU, 768 GB total), native FP4 precision support, and global presence.
Dataproc: G4 VMs are fully supported on the Dataproc managed analytics platform, letting you accelerate large-scale Spark and Hadoop workloads. This enables data scientists and data engineers to significantly boost performance for machine learning and large-scale data processing workloads.
Cloud Run: We’ve extended our serverless platform’s AI infrastructure options to include the NVIDIA RTX PRO 6000, so you can perform real-time AI inference with your preferred LLMs or media rendering using fully managed, simple, pay-per-use GPUs.
Hyperdisk ML, Managed Lustre, and Cloud Storage: When you need to expand beyond local storage for your HPC and large scale AI/ML workloads, you can connect G4 to a variety of Google Cloud storage services. For low latency and up to 500K of IO per instance, Hyperdisk ML is a great option. For high-performance file storage in the same zone, Managed Lustre offers a parallel file system ideal for persistent storage, up to 1TB/s. Finally, if you need nearly unlimited global capacity, with powerful capabilities like Anywhere Cache for use cases like inference, choose Cloud Storage as your primary, highly available, and globally scalable storage platform for training datasets, model artifacts, and feature stores.
What customers are saying
Here’s how customers are using G4 to innovate and accelerate within their businesses:
“The combination of NVIDIA Omniverse on Google Cloud G4 VMs is the true engine for our creative transformation. It empowers our teams to compress weeks of traditional production into hours, allowing us to instantly generate photorealistic 3D advertising environments at a global scale while ensuring pixel-perfect brand compliance—a capability that redefines speed and personalization in digital marketing.” – Perry Nightingale, SVP Creative AI, WPP
“We’re excited to bring the power of Google Cloud G4 VMs into Altair One, so you can run your most demanding simulation and fluid dynamics workloads with the speed, scale, and visual fidelity needed to push innovation further.” – Yeshwant Mummaneni, Chief Engineer – Analytics, HPC, IoT & Digital Twin, Altair
The Google Cloud advantage
Choosing Google Cloud means selecting a platform engineered for tangible results. The new G4 VM is a prime example, with our custom P2P interconnect unlocking up to 168% more throughput from the underlying NVIDIA RTX PRO 6000 Blackwell GPUs. This focus on optimized performance extends across our comprehensive portfolio; the G4 perfectly complements our existing A-Series and G2 GPUs, ensuring you have the ideal infrastructure for any workload. Beyond raw performance, we deliver turnkey solutions to accelerate your time to value. With NVIDIA Omniverse now available on the Google Cloud Marketplace, you can immediately deploy enterprise-grade digital twin and simulation applications on a fully managed and scalable platform.
G4 capacity is immediately available. To get started, simply select G4 VMs from the Google Cloud console. NVIDIA Omniverse and Isaac Sim are qualified Google Cloud Marketplace solutions that can draw down on your Google Cloud commitments; for more information, please contact your Google Cloud sales team or reseller.