This post is co-written with Abhishek Sawarkar, Eliuth Triana, Jiahong Liu and Kshitiz Gupta from NVIDIA.
At re:Invent 2024, we are excited to announce new capabilities to speed up your AI inference workloads with NVIDIA accelerated computing and software offerings on Amazon SageMaker. These advancements build upon our collaboration with NVIDIA, which includes adding support for inference-optimized GPU instances and integration with NVIDIA technologies. They represent our continued commitment to delivering scalable, cost-effective, and flexible GPU-accelerated AI inference capabilities to our customers.
Today, we are introducing three key advancements that further expand our AI inference capabilities:
In this post, we will explore how you can use these new capabilities to enhance your AI inference on Amazon SageMaker. We’ll walk through the process of deploying NVIDIA NIM microservices from AWS Marketplace for SageMaker Inference. We’ll then dive into NVIDIA’s model offerings on SageMaker JumpStart, showcasing how to access and deploy the Nemotron-4 model directly in the JumpStart interface. This will include step-by-step instructions on how to find the Nemotron-4 model in the JumpStart catalog, select it for your use case, and deploy it with a few clicks. We’ll also demonstrate how to fine-tune and optimize this model for your specific requirements. Additionally, we’ll introduce you to the new inference-optimized P5e and G6e instances powered by NVIDIA H200 and L40S GPUs, showcasing how they can significantly boost your AI inference performance. By the end of this post, you’ll have a practical understanding of how to implement these advancements in your own AI projects, enabling you to accelerate your inference workloads and drive innovation in your organization.
NVIDIA NIM, part of the NVIDIA AI Enterprise software platform, offers a set of high-performance microservices designed to help organizations rapidly deploy and scale generative AI applications on NVIDIA-accelerated infrastructure. SageMaker Inference is a fully managed capability for customers to run generative AI and machine learning models at scale, providing purpose-built features and a broad array of inference-optimized instances. AWS Marketplace serves as a curated digital catalog where customers can find, buy, deploy, and manage third-party software, data, and services needed to build solutions and run businesses. We’re excited to announce that AWS customers can now access NVIDIA NIM microservices for SageMaker Inference deployments through the AWS Marketplace , simplifying the deployment of generative AI models and helping partners and enterprises to scale their AI capabilities. The initial availability includes a portfolio of models packaged as NIM microservices, expanding the options for AI inference on Amazon SageMaker, including:
To deploy NVIDIA NIM microservices from the AWS Marketplace, follow these steps:
NVIDIA NIM microservices in the AWS Marketplace facilitates seamless deployment in SageMaker so that organizations across various industries can develop, deploy, and scale their generative AI applications more quickly and effectively than ever.
SageMaker JumpStart is a model hub and no-code solution within SageMaker that makes advanced AI inference capabilities more accessible to AWS customers by providing a streamlined path to access and deploy popular models from different providers. It offers an intuitive interface where organizations can easily deploy popular AI models with a few clicks, eliminating the complexity typically associated with model deployment and infrastructure management. The integration offers enterprise-grade features including model evaluation metrics, fine-tuning and customization capabilities, and collaboration tools, all while giving customers full control of their deployment.
We are excited to announce that NVIDIA models are now available in SageMaker JumpStart, marking a significant milestone in our ongoing collaboration. This integration brings NVIDIA’s cutting-edge AI models directly to SageMaker Inference customers, starting with the powerful Nemotron-4 model. With JumpStart, customers can access their state-of-the-art models within the SageMaker ecosystem to combine NVIDIA’s AI models with the scalable and price performance inference from SageMaker.
We are also excited to announce that NVIDIA Nemotron-4 is now available in JumpStart model hub. Nemotron-4 is a cutting-edge LLM designed to generate diverse synthetic data that closely mimics real-world data, enhancing the performance and robustness of custom LLMs across various domains. Compact yet powerful, it has been fine-tuned on carefully curated datasets that emphasize high-quality sources and underrepresented domains. This refined approach enables strong results in commonsense reasoning, mathematical problem-solving, and programming tasks. Moreover, Nemotron-4 exhibits outstanding multilingual capabilities compared to similarly sized models, and even outperforms those over four times larger and those explicitly specialized for multilingual tasks.
Nemotron-4 demonstrates great performance in common sense reasoning tasks like SIQA, ARC, PIQA, and Hellaswag with an average score of 73.4, outperforming similarly sized models and demonstrating similar performance against larger ones such as Llama-2 34B. Its exceptional multilingual capabilities also surpass specialized models like mGPT 13B and XGLM 7.5B on benchmarks like XCOPA and TyDiQA, highlighting its versatility and efficiency. When deployed through NVIDIA NIM microservices on SageMaker, these models deliver optimized inference performance, allowing businesses to generate and validate synthetic data with unprecedented speed and accuracy.
Through SageMaker JumpStart, customers can access pre-optimized models from NVIDIA that significantly simplify deployment and management. These containers are specifically tuned for NVIDIA GPUs on AWS, providing optimal performance out of the box. NIM microservices deliver efficient deployment and scaling, allowing organizations to focus on their use cases rather than infrastructure management.
SageMaker JumpStart provides an additional streamlined path to access and deploy NVIDIA NIM microservices, making advanced AI capabilities even more accessible to AWS customers. Through JumpStart’s intuitive interface, organizations can deploy Nemotron models with a few clicks, eliminating the complexity typically associated with model deployment and infrastructure management. The integration offers enterprise-grade features including model evaluation metrics, customization capabilities, and collaboration tools, all while maintaining data privacy within the customer’s VPC. This comprehensive integration enables organizations to accelerate their AI initiatives while using the combined strengths of the scalable infrastructure provided by AWS and NVIDIA’s optimized models.
SageMaker now supports new P5e and G6e instances, powered by NVIDIA GPUs for AI inference.
P5e instances use NVIDIA H200 Tensor Core GPUs for AI and machine learning. These instances offer 1.7 times larger GPU memory and 1.4 times higher memory bandwidth than previous generations. With eight powerful H200 GPUs per instance connected using NVIDIA NVLink for seamless GPU-to-GPU communication and blazing-fast 3,200 Gbps multi-node networking through EFA technology, P5e instances are purpose-built for deploying and training even the most demanding ML models. These instances deliver performance, reliability, and scalability for your cutting-edge inference applications.
G6e instances, powered by NVIDIA L40S GPUs, are one of the most cost-efficient GPU instances for deploying generative AI models and the highest-performance universal GPU instances for spatial computing, AI, and graphics workloads. They offer 2 times higher GPU memory (48 GB) and 2.9 times faster GPU memory bandwidth compared to G6 instances. G6e instances deliver up to 2.5 times better performance compared to G5 instances. Customers can use G6e instances to deploy LLMs and diffusion models for generating images, video, and audio. G6e instances feature up to eight NVIDIA L40S GPUs with 384 GB of total GPU memory (48 GB of memory per GPU) and third-generation AMD EPYC processors. They also support up to 192 vCPUs, up to 400 Gbps of network bandwidth, up to 1.536 TB of system memory, and up to 7.6 TB of local NVMe SSD storage.
Both instances’ families are now available on SageMaker Inference. Checkout AWS Region availability and pricing on our pricing page.
These new capabilities let you deploy NVIDIA NIM microservices on SageMaker through the AWS Marketplace, use new NVIDIA Nemotron models, and tap the latest GPU instance types to power your ML workloads. We encourage you to give these offerings a look and use them to accelerate your AI workloads on SageMaker Inference.
Discover AI's impact on enterprise marketing. Learn about transformation stages, AI workflows, and scaling strategies…
This article focuses on demystifying the difference between traditional data analytics methods vs.
Single-cell genomics has significantly advanced our understanding of cellular behavior, catalyzing innovations in treatments and…
Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate…
Businesses across all industries are turning to AI for a clear view of their operations…
Nvidia CEO Jensen Huang has a plan to bring AI infrastructure to countries around the…