ML 11670 SMStudioStack 1024x375 1
Amazon SageMaker Studio is a fully integrated development environment (IDE) for machine learning (ML) partly based on JupyterLab 3. Studio provides a web-based interface to interactively perform ML development tasks required to prepare data and build, train, and deploy ML models. In Studio, you can load data, adjust ML models, move in between steps to adjust experiments, compare results, and deploy ML models for inference.
The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to create AWS CloudFormation stacks through automatic CloudFormation template generation. A stack is a collection of AWS resources, that can be programmatically updated, moved, or deleted. AWS CDK constructs are the building blocks of AWS CDK applications, representing the blueprint to define cloud architectures.
Setting up Studio with AWS CDK has become a streamlined process. The AWS CDK allows you to use native constructs to define and deploy Studio using infrastructure as code (IaC), including AWS Identity and Access Management (AWS IAM) permissions and desired cloud resource configurations, all in one place. This development approach can be used in combination with other common software engineering best practices such as automated code deployments, tests, and CI/CD pipelines. The AWS CDK reduces the time required to perform typical infrastructure deployment tasks while shrinking the surface area for human error through automation.
This post guides you through the steps to get started with setting up and deploying Studio to standardize ML model development and collaboration with fellow ML engineers and ML scientists. All examples in the post are written in the Python programming language. However, the AWS CDK offers built-in support for multiple other programming languages like JavaScript, Java and C#.
To get started, the following prerequisites apply:
First, let’s clone the GitHub repository.
When the repository is successfully pulled, you may inspect the cdk directory containing the following resources:
The two main files we want to look at in the cdk
subdirectory are sagemaker_studio_construct.py
and sagemaker_studio_stack.py
. Let’s look at each file in more detail.
The Studio construct is defined in the sagemaker_studio_construct.py
file.
The Studio construct takes in the virtual private cloud (VPC), listed users, AWS Region, and underlying default instance type as parameters. This AWS CDK construct serves the following functions:
SageMakerStudioDomain
)sagemaker_studio_execution_role
with AmazonSageMakerFullAccess
permissions required to create resources. Permissions need to be scoped down further to follow the least privilege principle for improved security.JUPYTER_SERVER_APP_IMAGE_NAME
, defining the jupyter-server-3 container image to be used.KERNEL_GATEWAY_APP_IMAGE_NAME
, defining the datascience-2.0 container image to be used.The following code snippet shows the relevant Studio domain AWS CloudFormation resources defined in AWS CDK:
The following code snippet shows the user profiles created from AWS CloudFormation resources:
After the construct has been defined, you can add it by creating an instance of the class and passing the required arguments inside of the stack. The stack creates the AWS CloudFormation resources as part of one coherent deployment. This means that if at least one cloud resource fails to be created, the CloudFormation stack rolls back any changes performed. The following code snippet of the Studio construct instantiates inside of the Studio stack:
To deploy your AWS CDK stack, run the following commands from the project’s root directory within your terminal window:
aws configure
pip3 install -r requirements.txt
cdk bootstrap --app "python3 -m cdk.app"
cdk deploy --app "python3 -m cdk.app"
Review the resources the AWS CDK creates in your AWS account and select yes when prompted to deploy the stack. Wait for your stack deployment to finish. This typically takes less than 5 minutes; however, adding more resources will prolong deployment time. You can also check the deployment status on the AWS CloudFormation console.
When the stack has been successfully deployed, check its information by going to the Studio Control Panel. You should see the SageMaker Studio user profile you created.
If you redeploy the stack it will check for changes, performing only the cloud resource updates necessary. For example, this can be used to add users, or change permissions of those users without having to recreate all of the defined cloud resources.
To delete a stack, complete the following steps:
AWS CloudFormation will delete the resources created when the stack was deployed. This may take some time depending on the amount of resources created.
If you encounter any issues going through these cleanup steps, you may need to manually delete the Studio domain first before repeating the steps in this section.
In this post, we showed how to use AWS cloud-native IaC resources to build an easily reusable template for Studio deployments. SageMaker Studio is a fully integrated web-based IDE that provides a visual interface for ML development tasks based on JupyterLab3. With AWS CDK stacks, we were able to define constructs for building out cloud components that can be easily modified, edited, or deleted by making changes to the underlying CloudFormation stack.
For more information about Amazon Studio, see Amazon SageMaker Studio.
Jasper Research Lab’s new shadow generation research and model enable brands to create more photorealistic…
We’re announcing new updates to Gemini 2.0 Flash, plus introducing Gemini 2.0 Flash-Lite and Gemini…
Interactive digital agents (IDAs) leverage APIs of stateful digital environments to perform tasks in response…
This post is co-written with Martin Holste from Trellix. Security teams are dealing with an…
As AI continues to unlock new opportunities for business growth and societal benefits, we’re working…
An internal email obtained by WIRED shows that NOAA workers received orders to pause “ALL…