Today, we’re happy to announce updates to our Amazon SageMaker Experiments capability of Amazon SageMaker that lets you organize, track, compare and evaluate machine learning (ML) experiments and model versions from any integrated development environment (IDE) using the SageMaker Python SDK or boto3, including local Jupyter Notebooks.
Machine learning (ML) is an iterative process. When solving a new use case, data scientists and ML engineers iterate through various parameters to find the best model configurations (aka hyperparameters) that can be used in production to solve the identified business challenge. Over time, after experimenting with multiple models and hyperparameters, it becomes difficult for ML teams to efficiently manage model runs to find the optimal one without a tool to keep track of the different experiments. Experiment tracking systems streamline the processes to compare different iterations and helps simplify collaboration and communication in a team, thereby increasing productivity and saving time. This is achieved by organizing and managing ML experiments in an effortless way to draw conclusions from them, for example, finding the training run with the best accuracy.
To solve this challenge, SageMaker provides SageMaker Experiments, a fully integrated SageMaker capability. It provides the flexibility to log your model metrics, parameters, files, artifacts, plot charts from the different metrics, capture various metadata, search through them and support model reproducibility. Data scientists can quickly compare the performance and hyperparameters for model evaluation through visual charts and tables. They can also use SageMaker Experiments to download the created charts and share the model evaluation with their stakeholders.
With the new updates to SageMaker Experiments, it is now a part of the SageMaker SDK, simplifying the data scientist work and eliminating the need to install an extra library to manage multiple model executions. We are introducing the following new core concepts:
log_parameter
, log_parameters
, and log_metric
functions. Custom input and output can be logged using the log_file
function.The concepts that are implemented as part of a Run
class are made available from any IDE where the SageMaker Python SDK is installed. For SageMaker Training, Processing and
Transform Jobs, the SageMaker Experiment Run is automatically passed to the job if the job is invoked within a run context. You can recover the run object using load_run()
from your job. Finally, with the new functionalities’ integration, data scientists can also automatically log a confusion matrix, precision and recall graphs, and a ROC curve for classification use cases using the run.log_confusion_matrix
, run.log_precision_recall
, and run.log_roc_curve
functions, respectively.
In this blog post, we will provide examples of how to use the new SageMaker Experiments functionalities in a Jupyter notebook via the SageMaker SDK. We will demonstrate these capabilities using a PyTorch example to train an MNIST handwritten digits classification example. The experiment will be organized as follow:
load_run
function inside your job.For this blog post, we will use Amazon SageMaker Studio to showcase how to log metrics from a Studio notebook using the updated SageMaker Experiments functionalities. To execute the commands presented in our example, you need the following prerequisites:
ml.t3.medium
instance typeIf you do not have a SageMaker Domain and user profile available, you can create one using this quick setup guide.
For this exercise, we will use torchvision, a PyTorch package that provides popular datasets, model architectures, and common image transformations for computer vision. SageMaker Studio provides a set of Docker images for common data science use cases that are made available in Amazon ECR. For PyTorch, you have the option of selecting images optimized for CPU or GPU training. For this example, we will select the image PyTorch 1.12 Python 3.8 CPU Optimized
and the Python 3
kernel. The examples described below will focus on the SageMaker Experiments functionalities and are not code complete.
Let’s download the data with the torchvision
package and track the number of data samples for the train and test datasets as parameters with SageMaker Experiments. For this example, let’s assume train_set
and test_set
as already downloaded torchvision
datasets.
In this example, we use the run.log_parameters
to log the number of train and test data samples and run.log_file
to upload the raw datasets to Amazon S3 and log them as inputs to our experiment.
Now that we’ve downloaded our MNIST dataset, let’s train a CNN model to recognize the digits. While training the model, we want to load our existing experiment run, log new parameters to it, and track the model performance by logging model metrics.
We can use the load_run
function to load our previous run and use it to log our model training
We can then use run.log_parameter
and run.log_parameters
to log one or multiple model parameters to our run.
And we can use run.log_metric
to log performance metrics to our experiment.
For classification models, you can also use run.log_confusion_matrix
, run.log_precision_recall
, and run.log_roc_curve
, to automatically plot the confusion matrix, precision recall graph, and the ROC curve of your model. Since our model solves a multiclass classification problem, let’s log only the confusion matrix for it.
When looking at our run details, we can now see the generated metrics as shown in the screenshot below:
The run details page provides further information about the metrics.
And the new model parameters are tracked on the parameters overview page.
You can also analyze your model performance by class using the automatically plotted confusion matrix, which can also be downloaded and used for different reports. And you can plot extra graphs to analyze the performance of your model based on the logged metrics.
As a data scientist, you want to find the best possible model. That includes training a model multiple times with different hyperparameters and comparing the performance of the model with those hyperparameters. To do so, SageMaker Experiments allows us to create multiple runs in the same experiment. Let’s explore this concept by training our model with different num_hidden_channels
and optimizers
.
We are now creating six new runs for our experiment. Each one will log the model parameters, metrics, and confusion matrix. We can then compare the runs to select the best-performing model for the problem. When analyzing the runs, we can plot the metric graphs for the different runs as a single plot, comparing the performance of the runs across the different training steps (or epochs).
In the example above, we used SageMaker Experiments to log model performance from a SageMaker Studio notebook where the model was trained locally in the notebook. We can do the same to log model performance from SageMaker processing, training and batch transform jobs. With the new automatic context passing capabilities, we do not need to specifically share the experiment configuration with the SageMaker job, as it will be automatically captured.
The example below will focus on the SageMaker Experiments functionalities and is not code complete.
In our model script file, we can get the run context using load_run()
. In SageMaker processing and training jobs, we do not need to provide the experiment configuration for loading the configuration. For batch transform jobs, we need to provide experiment_name
and run_name
to load the experiment’s configuration.
In addition to the information we get when running SageMaker Experiments from a notebook script, the run from a SageMaker job will automatically populate the job parameters and outputs.
The new SageMaker Experiments SDK also ensures backwards compatibility with the previous version using the concepts of trials and trial components. Any experiment triggered using the previous SageMaker Experiments version will be automatically made available in the new UI, for analyzing the experiments.
SageMaker Clarify helps improve our ML models by detecting potential bias and helping explain how these models make predictions. Clarify provides pre-built containers that run as SageMaker processing jobs after your model has been trained, using information about your data (data configuration), model (model configuration), and the sensitive data columns that we want to analyze for possible bias (bias configuration). Up until now, SageMaker Experiments displayed our model training and Clarify reports as individual trial components that were connected via a trial.
With the new SageMaker Experiments, we can also integrate SageMaker Clarify reports with our model training having one source of truth that allows us to further understand our model. For an integrated report, all we need to do is to have the same run name for our training and Clarify jobs. The following example demonstrates how we can integrate the reports using an XGBoost model to predict the income of adults across the United States. The model uses the UCI Adult dataset. For this exercise, we assume that the model was already trained and that we already calculated the data, model, and bias configurations.
With this setup, we get a combined view that includes the model metrics, joint inputs and outputs, and the Clarify reports for model statistical bias and explainability.
In this post, we explored the new generation of SageMaker Experiments, an integrated part of SageMaker SDK. We demonstrated how to log your ML workflows from anywhere with the new Run class. We presented the new Experiments UI that allows you to track your experiments and plot graphs for a single run metric as well as to compare multiple runs with the new analysis capability. We provided examples of logging experiments from a SageMaker Studio notebook and from a SageMaker Studio training job. Finally, we showed how to integrate model training and SageMaker Clarify reports in a unified view, allowing you to further understand your model.
We encourage you to try out the new Experiments functionalities and connect with the Machine Learning & AI community if you have any questions or feedback!
Podcasts are a fun and easy way to learn about machine learning.
TL;DR We asked o1 to share its thoughts on our recent LNM/LMM post. https://www.artificial-intelligence.show/the-ai-podcast/o1s-thoughts-on-lnms-and-lmms What…
Palantir and Grafana Labs’ Strategic PartnershipIntroductionIn today’s rapidly evolving technological landscape, government agencies face the…
Amazon SageMaker Pipelines includes features that allow you to streamline and automate machine learning (ML)…
When it comes to AI, large language models (LLMs) and machine learning (ML) are taking…
Cohere's Command R7B uses RAG, features a context length of 128K, supports 23 languages and…