Dr. Vivek Madan 1 2
In December 2020, AWS announced the general availability of Amazon SageMaker JumpStart, a capability of Amazon SageMaker that helps you quickly and easily get started with machine learning (ML). JumpStart provides one-click fine-tuning and deployment of a wide variety of pre-trained models across popular ML tasks, as well as a selection of end-to-end solutions that solve common business problems. These features remove the heavy lifting from each step of the ML process, making it easier to develop high-quality models and reducing time to deployment.
This post is the fourth in a series on using JumpStart for specific ML tasks. In the first post, we showed how to run image classification use cases on JumpStart. In the second post, we demonstrated how to run text classification use cases. In the third post, we ran image segmentation use cases.
In this post, we provide a step-by-step walkthrough on how to deploy pre-trained text generation models. We explore two ways of obtaining the same result: via JumpStart’s graphical interface on Amazon SageMaker Studio, and programmatically through JumpStart APIs.
If you want to jump straight into the JumpStart API code we go through in this post, you can refer to the following sample Jupyter notebook: Introduction to JumpStart – Text Generation.
JumpStart helps you get started with ML models for a variety of tasks without writing a single line of code. Currently, JumpStart enables you to do the following:
JumpStart accepts custom VPC settings and AWS Key Management Service (AWS KMS) encryption keys, so you can use the available models and solutions securely within your enterprise environment. You can pass your security settings to JumpStart within Studio or through the SageMaker Python SDK.
Text generation is the task of generating text that is fluent and appears indistinguishable from human-written text. It is also known as natural language generation.
GPT-2 is a popular transformer-based text generation model. It’s pre-trained on a large corpus of raw English text with no human labeling. It’s trained on the task where, given a partial sequence (sentence or piece of text), the model needs to predict the next word or token in the sequence.
Bloom is also a transformer-based text generation model and trained similarly to GPT-2. However, Bloom is pre-trained on 46 different languages and 13 programming languages. The following is an example of running text generation with the Bloom model:
The following sections provide a step-by-step demo to perform inference, both via the Studio UI and via JumpStart APIs. We walk through the following steps:
In this section, we demonstrate how to train and deploy JumpStart models through the Studio UI.
The following video shows you how to find a pre-trained text generation model on JumpStart and deploy it. The model page contains valuable information about the model and how to use it. You can deploy any of the pre-trained models available in JumpStart. For inference, we pick the ml.p3.2xlarge instance type, because it provides the GPU acceleration needed for low inference latency at a low price point. After you configure the SageMaker hosting instance, choose Deploy. It may take 20–25 minutes until your persistent endpoint is up and running.
Once your endpoint is operational, it’s ready to respond to inference requests!
To accelerate your time to inference, JumpStart provides a sample notebook that shows you how to run inference on your freshly deployed endpoint. Choose Open Notebook under Use Endpoint from Studio.
In the preceding section, we showed how you can use the JumpStart UI to deploy a pre-trained model interactively, in a matter of a few clicks. However, you can also use JumpStart’s models programmatically by using APIs that are integrated into the SageMaker SDK.
In this section, we go over a quick example of how you can replicate the preceding process with the SageMaker SDK. We choose an appropriate pre-trained model in JumpStart, deploy this model to a SageMaker endpoint, and run inference on the deployed endpoint. All the steps in this demo are available in the accompanying notebook Introduction to JumpStart – Text Generation.
SageMaker is a platform that makes extensive use of Docker containers for build and runtime tasks. JumpStart uses the available framework-specific SageMaker Deep Learning Containers (DLCs). We first fetch any additional packages, as well as scripts to handle training and inference for the selected task. Finally, the pre-trained model artifacts are separately fetched with model_uris
, which provides flexibility to the platform. You can use any number of models pre-trained on the same task with a single inference script. See the following code:
Bloom is a very large model and can take up to 20–25 minutes to deploy. You can also use a smaller model such as GPT-2. To deploy a pre-trained GPT-2 model, you can set model_id = huggingface-textgeneration-gpt2
. For a list of other available models in JumpStart, refer to JumpStart Available Model Table.
Next, we feed the resources into a SageMaker model instance and deploy an endpoint:
After our model is deployed, we can get predictions from it in real time!
The following code snippet gives you a glimpse of what the outputs look like. To send requests to a deployed model, input text needs to be supplied in a utf-8
encoded format.
The endpoint response is a JSON object containing the input text followed by the generated text:
Our output is as follows:
In this post, we showed how to deploy a pre-trained text generation model using JumpStart. You can accomplish this without needing to write code. Try out the solution on your own and send us your comments. To learn more about JumpStart and how you can use open-source pre-trained models for a variety of other ML tasks, check out the following AWS re:Invent 2020 video.
Our next iteration of the FSF sets out stronger security protocols on the path to…
Large neural networks pretrained on web-scale corpora are central to modern machine learning. In this…
Generative AI has revolutionized technology through generating content and solving complex problems. To fully take…
At Google Cloud, we're deeply invested in making AI helpful to organizations everywhere — not…
Advanced Micro Devices reported revenue of $7.658 billion for the fourth quarter, up 24% from…