Amazon SageMaker provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and machine learning (ML) practitioners get started training and deploying ML models quickly. You can use these algorithms and models for both supervised and unsupervised learning. They can process various types of input data, including tabular, image, and text.
This post is the third in a series on the new built-in algorithms in SageMaker. In the first post, we showed how SageMaker provides a built-in algorithm for image classification. In the second post, we showed how SageMaker provides a built-in algorithm for object detection. Today, we announce that SageMaker provides a new built-in algorithm for text classification using TensorFlow. This supervised learning algorithm supports transfer learning for many pre-trained models available in TensorFlow hub. It takes a piece of text as input and outputs the probability for each of the class labels. You can fine-tune these pre-trained models using transfer learning even when a large corpus of text isn’t available. It’s available through the SageMaker built-in algorithms, as well as through the SageMaker JumpStart UI in Amazon SageMaker Studio. For more information, refer to Text Classification and the example notebook Introduction to JumpStart – Text Classification.
Text Classification with TensorFlow in SageMaker provides transfer learning on many pre-trained models available in the TensorFlow Hub. According to the number of class labels in the training data, a classification layer is attached to the pre-trained TensorFlow hub model. The classification layer consists of a dropout layer and a dense layer, fully connected layer, with 2-norm regularizer, which is initialized with random weights. The model training has hyper-parameters for the dropout rate of dropout layer, and L2 regularization factor for the dense layer. Then, either the whole network, including the pre-trained model, or only the top classification layer can be fine-tuned on the new training data. In this transfer learning mode, training can be achieved even with a smaller dataset.
This section describes how to use the TensorFlow text classification algorithm with the SageMaker Python SDK. For information on how to use it from the Studio UI, see SageMaker JumpStart.
The algorithm supports transfer learning for the pre-trained models listed in Tensorflow models. Each model is identified by a unique model_id
. The following code shows how to fine-tune BERT base model identified by model_id
tensorflow-tc-bert-en-uncased-L-12-H-768-A-12-2
on a custom training dataset. For each model_id
, to launch a SageMaker training job through the Estimator class of the SageMaker Python SDK, you must fetch the Docker image URI, training script URI, and pre-trained model URI through the utility functions provided in SageMaker. The training script URI contains all of the necessary code for data processing, loading the pre-trained model, model training, and saving the trained model for inference. The pre-trained model URI contains the pre-trained model architecture definition and the model parameters. The pre-trained model URI is specific to the particular model. The pre-trained model tarballs have been pre-downloaded from TensorFlow and saved with the appropriate model signature in Amazon Simple Storage Service (Amazon S3) buckets, so that the training job runs in network isolation. See the following code:
With these model-specific training artifacts, you can construct an object of the Estimator class:
Next, for transfer learning on your custom dataset, you might need to change the default values of the training hyperparameters, which are listed in Hyperparameters. You can fetch a Python dictionary of these hyperparameters with their default values by calling hyperparameters.retrieve_default
, update them as needed, and then pass them to the Estimator class. Note that the default values of some of the hyperparameters are different for different models. For large models, the default batch size is smaller and the train_only_top_layer
hyperparameter is set to True
. The hyperparameter Train_only_top_layer
defines which model parameters change during the fine-tuning process. If train_only_top_layer
is True
, then parameters of the classification layers change and the rest of the parameters remain constant during the fine-tuning process. On the other hand, if train_only_top_layer
is False
, then all of the parameters of the model are fine-tuned. See the following code:
We provide the SST2 as a default dataset for fine-tuning the models. The dataset contains positive and negative movie reviews. It has been downloaded from TensorFlow under Apache 2.0 License. The following code provides the default training dataset hosted in S3 buckets.
Finally, to launch the SageMaker training job for fine-tuning the model, call .fit on the object of the Estimator class, while passing the Amazon S3 location of the training dataset:
For more information about how to use the new SageMaker TensorFlow text classification algorithm for transfer learning on a custom dataset, deploy the fine-tuned model, run inference on the deployed model, and deploy the pre-trained model as is without first fine-tuning on a custom dataset, see the following example notebook: Introduction to JumpStart – Text Classification.
You can fine-tune each of the pre-trained models listed in TensorFlow Models to any given dataset made up of text sentences with any number of classes. The pre-trained model attaches a classification layer to the Text Embedding model and initializes the layer parameters to random values. The output dimension of the classification layer is determined based on the number of classes detected in the input data. The objective is to minimize classification errors on the input data. The model returned by fine-tuning can be further deployed for inference.
The following instructions describe how the training data should be formatted for input to the model:
The following is an example of an input CSV file. Note that the file should not have any header. The file should be hosted in an S3 bucket with a path similar to the following: s3://bucket_name/input_directory/
. Note that the trailing /
is required.
The generated models can be hosted for inference and support text as the application/x-text
content type. The output contains the probability values, class labels for all of the classes, and the predicted label corresponding to the class index with the highest probability encoded in the JSON format. The model processes a single string per request and outputs only one line. The following is an example of a JSON format response:
If accept
is set to application/json
, then the model only outputs probabilities. For more details on training and inference, see the sample notebook Introduction to Introduction to JumpStart – Text Classification.
You can also use SageMaker TensorFlow text classification and any of the other built-in algorithms with a few clicks via the JumpStart UI. JumpStart is a SageMaker feature that lets you train and deploy built-in algorithms and pre-trained models from various ML frameworks and model hubs through a graphical interface. Furthermore, it lets you deploy fully-fledged ML solutions that string together ML models and various other AWS services to solve a targeted use case.
Following are two videos that show how you can replicate the same fine-tuning and deployment process we just went through with a few clicks via the JumpStart UI.
Here is the process to fine-tune the same pre-trained text classification model.
After model training is finished, you can directly deploy the model to a persistent, real-time endpoint with one click.
In this post, we announced the launch of the SageMaker TensorFlow text classification built-in algorithm. We provided example code for how to do transfer learning on a custom dataset using a pre-trained model from TensorFlow hub using this algorithm.
For more information, check out the documentation and the example notebook Introduction to JumpStart – Text Classification.
Podcasts are a fun and easy way to learn about machine learning.
TL;DR We asked o1 to share its thoughts on our recent LNM/LMM post. https://www.artificial-intelligence.show/the-ai-podcast/o1s-thoughts-on-lnms-and-lmms What…
Palantir and Grafana Labs’ Strategic PartnershipIntroductionIn today’s rapidly evolving technological landscape, government agencies face the…
Amazon SageMaker Pipelines includes features that allow you to streamline and automate machine learning (ML)…
When it comes to AI, large language models (LLMs) and machine learning (ML) are taking…
Cohere's Command R7B uses RAG, features a context length of 128K, supports 23 languages and…