image 36
Today, we are excited to announce the day-zero availability of NVIDIA Nemotron 3 Ultra on Amazon SageMaker JumpStart.
With this launch, you can now deploy the Nemotron 3 Ultra model using a one-click deployment experience. Nemotron 3 Ultra is an open model built for frontier reasoning and orchestration in long-running autonomous agents, delivering 5x faster inference and up to 30% lower cost for agentic workloads. Nemotron 3 Ultra is optimized for the NVFP4 format, which makes the model much faster and cost effective to host.
NVIDIA Nemotron 3 Ultra is an open large language model with 550 billion total parameters and 55 billion active parameters. It is built on a hybrid Transformer-Mamba Mixture-of-Experts (MoE) architecture, designed to deliver frontier intelligence at a fraction of the compute cost of dense models of equivalent quality.
| Specification | Details |
|---|---|
| Architecture | Hybrid Transformer-Mamba MoE |
| Parameters | 550B total / 55B active |
| Context length | Up to 1M tokens |
| Input / Output | Text in, text out |
| Precision | NVFP4 |
| Inference speed | 5x faster for long-running agent workflows |
| Cost | Up to 30% lower for complex agentic tasks |
Agents don’t just answer once. They plan, call tools, delegate work to sub-agents, check results, and keep going across hundreds of turns. Every step adds tokens and compute, so the metrics that matter are task completion at useful accuracy, time-to-finish, and cost-per-task.
Nemotron 3 Ultra addresses this directly. Its MoE architecture activates only 55B of its 550B parameters per forward pass, keeping throughput high even at million-token context lengths. This means agents can sustain planning, tool calling, and self-correction loops that span hundreds of turns while helping maintain coherence and manage cost.
Nemotron 3 Ultra excels in workloads that require sustained multi-step reasoning:
You can deploy Nemotron 3 Ultra through Amazon SageMaker JumpStart with one-click deployment, removing the need to manage infrastructure or configure serving frameworks.
Before you begin, make sure you have:
Important: Deploying this model creates a SageMaker endpoint that incurs charges while running. GPU instances like ml.p5en.48xlarge can cost several dollars per hour. See Amazon SageMaker AI pricing for details. Remember to delete your endpoint when finished to avoid ongoing charges.
Run inference
To avoid incurring unnecessary charges, delete the SageMaker endpoint when you are done:predictor.delete_endpoint()
NVIDIA Nemotron 3 Ultra brings frontier-class reasoning to Amazon SageMaker JumpStart with 5x faster inference and up to 30% lower cost for agentic workloads. Its hybrid Transformer-Mamba MoE architecture and million-token context window make it purpose-built for the sustained, multi-step reasoning that production agents demand.
Whether you are building agent orchestrators, coding agents, deep research systems, or complex enterprise automation, Nemotron 3 Ultra is ready to deploy today from SageMaker JumpStart.
Get started now by searching for Nemotron 3 Ultra in Amazon SageMaker JumpStart.
In this article, you will learn why a large context window is not the same…
When your document repository contains hundreds of millions of files accumulated over nearly a decade,…
The Skylight Calendar 2 and Calendar Max are both on sale for Prime Day if…
A research team led by Sant'Anna School of Advanced Studies in Pisa, in collaboration with…
Hey everyone, We're the team behind Krea, and today we're launching Krea 2, our new…
The current era of Generative AI seems to primarily focus on chat interfaces and prompts,…