Skewness Be Gone: Transformative Tricks for Data Scientists

Data transformations enable data scientists to refine, normalize, and standardize raw data into a format ripe for analysis. These transformations are not merely procedural steps; they are essential in mitigating biases, handling skewed distributions, and enhancing the robustness of statistical models. This post will primarily focus on how to address skewed data. By focusing on …

CroissantHero 1

Croissant: a metadata format for ML-ready datasets

Posted by Omar Benjelloun, Software Engineer, Google Research, and Peter Mattson, Software Engineer, Google Core ML and President, MLCommons Association Machine learning (ML) practitioners looking to reuse existing datasets to train an ML model often spend a lot of time understanding the data, making sense of its organization, or figuring out what subset to use …

9 ways developer productivity is boosted by generative AI

Software development is one arena where we are already seeing significant impacts from generative AI tools. The benefits are many, and significant productivity gains are currently available to enterprises that embrace these tools. A McKinsey study claims that software developers can complete coding tasks up to twice as fast with generative AI. The consulting firm’s …

CroissantHero

Croissant: a metadata format for ML-ready datasets

Posted by Omar Benjelloun, Software Engineer, Google Research, and Peter Mattson, Software Engineer, Google Core ML and President, MLCommons Association Machine learning (ML) practitioners looking to reuse existing datasets to train an ML model often spend a lot of time understanding the data, making sense of its organization, or figuring out what subset to use …

ML 15387 amino acid chain

Efficiently fine-tune the ESM-2 protein language model with Amazon SageMaker

In this post, we demonstrate how to efficiently fine-tune a state-of-the-art protein language model (pLM) to predict protein subcellular localization using Amazon SageMaker. Proteins are the molecular machines of the body, responsible for everything from moving your muscles to responding to infections. Despite this variety, all proteins are made of repeating chains of molecules called …

contextual ai.max 1000x1000 1

Running AI on fully managed GKE, now with new compute options, pricing and resource reservations

Kubernetes is a popular way to run AI workloads like training, and large language model (LLM) serving, including our new open model Gemma. Google Kubernetes Engine (GKE) in Autopilot mode provides a fully managed Kubernetes platform that offers the power and flexibility of Kubernetes but without the need to worry about compute nodes, so you …