image 1 25

Automate advanced agentic RAG pipeline with Amazon SageMaker AI

Retrieval Augmented Generation (RAG) is a fundamental approach for building advanced generative AI applications that connect large language models (LLMs) to enterprise knowledge. However, crafting a reliable RAG pipeline is rarely a one-shot process. Teams often need to test dozens of configurations (varying chunking strategies, embedding models, retrieval techniques, and prompt designs) before arriving at …

ml19267 1

Enhance video understanding with Amazon Bedrock Data Automation and open-set object detection

In real-world video and image analysis, businesses often face the challenge of detecting objects that weren’t part of a model’s original training set. This becomes especially difficult in dynamic environments where new, unknown, or user-defined objects frequently appear. For example, media publishers might want to track emerging brands or products in user-generated content; advertisers need …

ML 19569 1

TII Falcon-H1 models now available on Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

This post was co-authored with Jingwei Zuo from TII. We are excited to announce the availability of the Technology Innovation Institute (TII)’s Falcon-H1 models on Amazon Bedrock Marketplace and Amazon SageMaker JumpStart. With this launch, developers and data scientists can now use six instruction-tuned Falcon-H1 models (0.5B, 1.5B, 1.5B-Deep, 3B, 7B, and 34B) on AWS, and have access …

1 xN8hT4Emax 1000x1000 1

Scaling high-performance inference cost-effectively

At Google Cloud Next 2025, we announced new inference capabilities with GKE Inference Gateway, including support for vLLM on TPUs, Ironwood TPUs, and Anywhere Cache.  Our inference solution is based on AI Hypercomputer, a system built on our experience running models like Gemini and Veo 3, which serve over 980 trillion tokens a month to …

cooksbar 100x133 1

Powering innovation at scale: How AWS is tackling AI infrastructure challenges

As generative AI continues to transform how enterprises operate—and develop net new innovations—the infrastructure demands for training and deploying AI models have grown exponentially. Traditional infrastructure approaches are struggling to keep pace with today’s computational requirements, network demands, and resilience needs of modern AI workloads. At AWS, we’re also seeing a transformation across the technology …

Introducing the Agentic SOC Workshops for security professionals

The security operations centers of the future will use agentic AI to enable intelligent automation of routine tasks, augment human decision-making, and streamline workflows. At Google Cloud, we want to help prepare today’s security professionals to get the most out of tomorrow’s AI agents. As we build our agentic vision, we’re also excited to invite …

ml 19284 1

Maximize HyperPod Cluster utilization with HyperPod task governance fine-grained quota allocation

We are excited to announce the general availability of fine-grained compute and memory quota allocation with HyperPod task governance. With this capability, customers can optimize Amazon SageMaker HyperPod cluster utilization on Amazon Elastic Kubernetes Service (Amazon EKS), distribute fair usage, and support efficient resource allocation across different teams or projects. For more information, see HyperPod task governance best …

Registration now open: Our no-cost, generative AI training and certification program for veterans

Growing up in a Navy family instilled a strong sense of purpose in me. My father’s remarkable 42 years of naval service not only shaped my values, but inspired me to join the Navy myself. Today, as the leader of Google Public Sector, I’m honored to continue that tradition of service in support of our …

SMHP Solution Architecture

Accelerating HPC and AI research in universities with Amazon SageMaker HyperPod

This post was written with Mohamed Hossam of Brightskies. Research universities engaged in large-scale AI and high-performance computing (HPC) often face significant infrastructure challenges that impede innovation and delay research outcomes. Traditional on-premises HPC clusters come with long GPU procurement cycles, rigid scaling limits, and complex maintenance requirements. These obstacles restrict researchers’ ability to iterate …

Investigate fast with AI: Gemini Cloud Assist for Dataproc & Serverless for Apache Spark

Apache Spark is a fundamental part of most modern lakehouse architectures, and Google Cloud’s Dataproc provides a powerful, fully managed platform for running Spark applications. However, for data engineers and scientists, debugging failures and performance bottlenecks in distributed systems remains a universal challenge. Manually troubleshooting a Spark job requires piecing together clues from disparate sources …