1 Dataset versions

Tracking and managing assets used in AI development with Amazon SageMaker AI 

Building custom foundation models requires coordinating multiple assets across the development lifecycle such as data assets, compute infrastructure, model architecture and frameworks, lineage, and production deployments. Data scientists create and refine training datasets, develop custom evaluators to assess model quality and safety, and iterate through fine-tuning configurations to optimize performance. As these workflows scale across …

Automate AI and HPC clusters with Cluster Director, now generally available

The complexity of the infrastructure behind AI training and high performance computing (HPC) workloads can really slow teams down. At Google Cloud, where we work with some of the world’s largest AI research teams, we see it everywhere we go: researchers hampered by complex configuration files, platform teams struggling to manage GPUs with home-grown scripts, …

GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing

Editing images using natural language instructions has become a natural and expressive way to modify visual content; yet, evaluating the performance of such models remains challenging. Existing evaluation approaches often rely on image-text similarity metrics like CLIP, which lack precision. In this work, we introduce a new benchmark designed to evaluate text-guided image editing models …

Palantir and L3Harris

Reindustrializing Defense Through AI-Powered Production Editor’s Note: This is a joint blog post written by Palantir and L3Harris about our partnership to reindustrialize America’s defense industrial base. Introduction The challenge facing America’s defense industrial base is not just about speed — it’s about rebuilding the foundation that makes speed possible. Our nation’s ability to rapidly reindustrialize and …

ml 20158 image 1

Governance by design: The essential guide for successful AI scaling

Picture this: Your enterprise has just deployed its first generative AI application. The initial results are promising, but as you plan to scale across departments, critical questions emerge. How will you enforce consistent security, prevent model bias, and maintain control as AI applications multiply? It turns out you’re not alone. A McKinsey survey spanning 750+ …

How Temporal Powers Reliable Cloud Operations at Netflix

By Jacob Meyers and Rob Zienert Temporal is a Durable Execution platform which allows you to write code “as if failures don’t exist”. It’s become increasingly critical to Netflix since its initial adoption in 2021, with users ranging from the operators of our Open Connect global CDN to our Live reliability teams now depending on Temporal …

cluster failure 1 1024x656 1

Checkpointless training on Amazon SageMaker HyperPod: Production-scale training with faster fault recovery

Foundation model training has reached an inflection point where traditional checkpoint-based recovery methods are becoming a bottleneck to efficiency and cost-effectiveness. As models grow to trillions of parameters and training clusters expand to thousands of AI accelerators, even minor disruptions can result in significant costs and delays. In this post, we introduce checkpointless training on …

1 pBHTpqa

Connect your enterprise data to Google’s new Antigravity IDE

The AI state of the art is shifting rapidly from simple chat interfaces to autonomous agents capable of planning, executing, and refining complex workflows. In this new landscape, the ability to ground these intelligent agents in your enterprise data is key to unlocking true business value. Google Cloud is at the forefront of this shift, …

traditional vs live apimax 1000x1000 1

A developer’s guide to Gemini Live API in Vertex AI

Give your AI apps and agents a natural, almost human-like interface, all through a single WebSocket connection.  Today, we announced the general availability of Gemini Live API on Vertex AI, which is powered by the latest Gemini 2.5 Flash Native Audio model. This is more than just a model upgrade; it represents a fundamental move …

3 Actionable AI Recommendations for Businesses in 2026

TL;DR In 2026, the businesses that win with AI will do three things differently: redesign core workflows around AI agents, treat AI as an operating system rather than a toolset, and deliberately restructure human work to compound AI advantages instead of fighting them. By 2026, AI will no longer be a differentiator by itself. Nearly …