The demand for multi-object tracking (MOT) in video analysis has increased significantly in many industries, such as live sports, manufacturing, and traffic monitoring. For example, in live sports, MOT can track soccer players in real time to analyze physical performance such as real-time speed and moving distance.
Since its introduction in 2021, ByteTrack remains to be one of best performing methods on various benchmark datasets, among the latest model developments in MOT application. In ByteTrack, the author proposed a simple, effective, and generic data association method (referred to as BYTE) for detection box and tracklet matching. Rather than only keep the high score detection boxes, it also keeps the low score detection boxes, which can help recover unmatched tracklets with these low score detection boxes when occlusion, motion blur, or size changing occurs. The BYTE association strategy can also be used in other Re-ID based trackers, such as FairMOT. The experiments showed improvements compared to the vanilla tracker algorithms. For example, FairMOT achieved an improvement of 1.3% on MOTA (FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking), which is one of the main metrics in the MOT task when applying BYTE in data association.
In the post Train and deploy a FairMOT model with Amazon SageMaker, we demonstrated how to train and deploy a FairMOT model with Amazon SageMaker on the MOT challenge datasets. When applying a MOT solution in real-world cases, you need to train or fine-tune a MOT model on a custom dataset. With Amazon SageMaker Ground Truth, you can effectively create labels on your own video dataset.
Following on the previous post, we have added the following contributions and modifications:
We also provide the code sample on GitHub, which uses SageMaker for labeling, building, training, and inference.
SageMaker is a fully managed service that provides every developer and data scientist with the ability to prepare, build, train, and deploy machine learning (ML) models quickly. SageMaker provides several built-in algorithms and container images that you can use to accelerate training and deployment of ML models. Additionally, custom algorithms such as ByteTrack can also be supported via custom-built Docker container images. For more information about deciding on the right level of engagement with containers, refer to Using Docker containers with SageMaker.
SageMaker provides plenty of options for model deployment, such as real-time inference, serverless inference, and asynchronous inference. In this post, we show how to deploy a tracking model with different deployment options, so that you can choose the suitable deployment method in your own use case.
Our solution consists of the following high-level steps:
The following diagram illustrates the architecture in each step.
Before getting started, complete the following prerequisites:
us-east-1
Region.ml.p3.2xlarge
for single GPU training, or ml.p3.16xlarge
) for the distributed training job. Other types of GPU instances are also supported, with various performance differences.ml.p3.2xlarge
) for inference endpoint.ml.p3.2xlarge
) for running batch prediction with processing jobs.If this is your first time running SageMaker services on the aforementioned instance types, you may have to request a quota increase for the required instances.
After you complete all the prerequisites, you’re ready to deploy the solution.
ml.t3.medium
instance type. While running the code, we use docker build
to extend the SageMaker training image with the ByteTrack code (the docker build
command will be run locally within the notebook instance environment). Therefore, we recommend increasing the volume size to 100 GB (default volume size to 5 GB) from the advanced configuration options. For your AWS Identity and Access Management (IAM) role, choose an existing role or create a new role, and attach the AmazonS3FullAccess
, AmazonSNSFullAccess
, AmazonSageMakerFullAccess
, and AmazonElasticContainerRegistryPublicFullAccess
policies to the role./home/ec2-user/SageMaker
folder on the notebook instance you created.In the data-preparation.ipynb notebook, we download an MOT16 test video file and split the video file into small video files with 200 frames. Then we upload those video files to the S3 bucket as the data source for labeling.
To label the dataset for the MOT task, refer to Getting started. When the labeling job is complete, we can access the following annotation directory at the job output location in the S3 bucket.
The manifests
directory should contain an output
folder if we finished labeling all the files. We can see the file output.manifest
in the output
folder. This manifest file contains information about the video and video tracking labels that you can use later to train and test a model.
To train your ByteTrack model, we use the bytetrack-training.ipynb notebook. The notebook consists of the following steps:
Especially in data preprocessing, we need to convert the labeled dataset with the Ground Truth output format to the MOT17 format dataset, and convert the MOT17 format dataset to a MSCOCO format dataset (as shown in the following figure) so that we can train a YOLOX model on the custom dataset. Because we keep both the MOT format dataset and MSCOCO format dataset, you can train other MOT algorithms without separating detection and tracking on the MOT format dataset. You can easily change the detector to other algorithms such as YOLO7 to use your existing object detection algorithm.
After we train the YOLOX model, we deploy the trained model for inference. SageMaker provides several options for model deployment, such as real-time inference, asynchronous inference, serverless inference, and batch inference. In our post, we use the sample code for real-time inference, asynchronous inference, and batch inference. You can choose the suitable code from these options based on your own business requirements.
Because SageMaker batch transform requires the data to be partitioned and stored on Amazon S3 as input and the invocations are sent to the inference endpoints concurrently, it doesn’t meet the requirements in object tracking tasks where the targets need to be sent in a sequential manner. Therefore, we don’t use the SageMaker batch transform jobs to run the batch inference. In this example, we use SageMaker processing jobs to do batch inference.
The following table summarizes the configuration for our inference jobs.
Inference Type | Payload | Processing Time | Auto Scaling |
Real-time | Up to 6 MB | Up to 1 minute | Minimum instance count is 1 or higher |
Asynchronous | Up to 1 GB | Up to 15 minutes | Minimum instance count can be zero |
Batch (with processing job) | No limit | No limit | Not supported |
To deploy a real-time inference endpoint, we can run the bytetrack-inference-yolox.ipynb notebook. We separate ByteTrack inference into object detection and tracking. In the inference endpoint, we only run the YOLOX model for object detection. In the notebook, we create a tracking object, receive the result of object detection from the inference endpoint, and update trackers.
We use SageMaker PyTorchModel SDK to create and deploy a ByteTrack model as follows:
After we deploy the model to an endpoint successfully, we can invoke the inference endpoint with the following code snippet:
We run the tracking task on the client side after accepting the detection result from the endpoint (see the following code). By drawing the tracking results in each frame and saving as a tracking video, you can confirm the tracking result on the tracking video.
SageMaker asynchronous inference is the ideal option for requests with large payload sizes (up to 1 GB), long processing times (up to 1 hour), and near-real-time latency requirements. For MOT tasks, it’s common that a video file is beyond 6 MB, which is the payload limit of a real-time endpoint. Therefore, we deploy an asynchronous inference endpoint. Refer to Asynchronous inference for more details of how to deploy an asynchronous endpoint. We can reuse the model created for the real-time endpoint; for this post, we put a tracking process into the inference script so that we can get the final tracking result directly for the input video.
To use scripts related to ByteTrack on the endpoint, we need to put the tracking script and model into the same folder and compress the folder as the model.tar.gz
file, and then upload it to the S3 bucket for model creation. The following diagram shows the structure of model.tar.gz
.
We need to explicitly set the request size, response size, and response timeout as the environment variables, as shown in the following code. The name of the environment variable varies depending on the framework. For more details, refer to Create an Asynchronous Inference Endpoint.
When invoking the asynchronous endpoint, instead of sending the payload in the request, we send the Amazon S3 URL of the input video. When the model inference finishes processing the video, the results will be saved on the S3 output path. We can configure Amazon Simple Notification Service (Amazon SNS) topics so that when the results are ready, we can receive an SNS message as a notification.
For video files bigger than 1 GB, we use a SageMaker processing job to do batch inference. We define a custom Docker container to run a SageMaker processing job (see the following code). We draw the tracking result on the input video. You can find the result video in the S3 bucket defined by s3_output
.
To avoid unnecessary costs, delete the resources you created as part of this solution, including the inference endpoint.
This post demonstrated how to implement a multi-object tracking solution on a custom dataset using one of the state-of-the-art algorithms on SageMaker. We also demonstrated three deployment options on SageMaker so that you can choose the optimal option for your own business scenario. If the use case requires low latency and needs a model to be deployed on an edge device, you can deploy the MOT solution at the edge with AWS Panorama.
For more information, refer to Multi Object Tracking using YOLOX + BYTE-TRACK and data analysis.
TL;DR A conversation with 4o about the potential demise of companies like Anthropic. As artificial…
Whether a company begins with a proof-of-concept or live deployment, they should start small, test…
Digital tools are not always superior. Here are some WIRED-tested agendas and notebooks to keep…
Machine learning (ML) models are built upon data.
Editor’s note: This is the second post in a series that explores a range of…
David J. Berg*, David Casler^, Romain Cledat*, Qian Huang*, Rui Lin*, Nissan Pow*, Nurcan Sonmez*,…