1 Cloud TPU v4.max 1000x1000 1

Google’s Cloud TPU v4 provides exaFLOPS-scale ML with industry-leading efficiency

Editor’s note: Today, two legendary Google engineers describe the “secret sauce” that has made TPU v4 a platform of choice for the world’s leading AI researchers and developers for training machine learning models at scale. Norm Jouppi is the chief architect for all Google’s TPUs, from TPU v1 to TPU v4. He is a Google …

H100 GPU inference performance MLPerf 672x375 1

NVIDIA Takes Inference to New Heights Across MLPerf Tests

MLPerf remains the definitive measurement for AI performance as an independent, third-party benchmark. NVIDIA’s AI platform has consistently shown leadership across both training and inference since the inception of MLPerf, including the MLPerf Inference 3.0 benchmarks released today. “Three years ago when we introduced A100, the AI world was dominated by computer vision. Generative AI …

BDB 2749 Figure 1 Architecture for Automated FAQ Update for Amazon Kendra

Automate and implement version control for Amazon Kendra FAQs

Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra reimagines enterprise search for your websites and applications so your employees and customers can easily find the content they’re looking for, even when it’s scattered across multiple locations and content repositories within your organization. Amazon Kendra FAQs allow users to upload …

Meet the Data Champions: How Goodcall is bringing the power of AI to Main Street Businesses

In our new blog series, “Meet our Data Champions,” we showcase the exciting work Google Cloud customers are doing with data and AI/ML. In this edition, we talk to Bob Summers, CEO and founder of Goodcall, a company whose phone AI service leverages Google Cloud Speech AI technologies to bring the power of AI to …

TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation

Denoising Diffusion models have demonstrated their proficiency for generative sampling. However, generating good samples often requires many iterations. Consequently, techniques such as binary time-distillation (BTD) have been proposed to reduce the number of network calls for a fixed architecture. In this paper, we introduce TRAnsitive Closure Time-distillation (TRACT), a new method that extends BTD. For …

ML 13247 image1

Generate a counterfactual analysis of corn response to nitrogen with Amazon SageMaker JumpStart solutions

In his book The Book of Why, Judea Pearl advocates for teaching cause and effect principles to machines in order to enhance their intelligence. The accomplishments of deep learning are essentially just a type of curve fitting, whereas causality could be used to uncover interactions between the systems of the world under various constraints without …

Vertex AI Experiments Autologging

How you can automate ML experiment tracking with Vertex AI Experiments autologging

Practical machine learning (ML) is a trial and error process. ML practitioners compare different performance metrics by running ML experiments till you find the best model with a given set of parameters. Because of the experimental nature of ML, there are many reasons for tracking ML experiments and making them reproducible including debugging and compliance.  …

vit22b

Scaling vision transformers to 22 billion parameters

Posted by Piotr Padlewski and Josip Djolonga, Software Engineers, Google Research Large Language Models (LLMs) like PaLM or GPT-3 showed that scaling transformers to hundreds of billions of parameters improves performance and unlocks emergent abilities. The biggest dense models for image understanding, however, have reached only 4 billion parameters, despite research indicating that promising multimodal …

ml 8582 nnew img1

Reduce call hold time and improve customer experience with self-service virtual agents using Amazon Connect and Amazon Lex

This post was co-written with Tony Momenpour and Drew Clark from KYTC. Government departments and businesses operate contact centers to connect with their communities, enabling citizens and customers to call to make appointments, request services, and sometimes just ask a question. When there are more calls than agents can answer, callers get placed on hold …