Categories: FAANG

Hugging Face Offers Developers Inference-as-a-Service Powered by NVIDIA NIM

One of the world’s largest AI communities — comprising 4 million developers on the Hugging Face platform — is gaining easy access to NVIDIA-accelerated inference on some of the most popular AI models.

New inference-as-a-service capabilities will enable developers to rapidly deploy leading large language models such as the Llama 3 family and Mistral AI models with optimization from NVIDIA NIM microservices running on NVIDIA DGX Cloud.

Announced today at the SIGGRAPH conference, the service will help developers quickly prototype with open-source AI models hosted on the Hugging Face Hub and deploy them in production. Enterprise Hub users can tap serverless inference for increased flexibility, minimal infrastructure overhead and optimized performance with NVIDIA NIM.

The inference service complements Train on DGX Cloud, an AI training service already available on Hugging Face.

Developers facing a growing number of open-source models can benefit from a hub where they can easily compare options. These training and inference tools give Hugging Face developers new ways to experiment with, test and deploy cutting-edge models on NVIDIA-accelerated infrastructure. They’re made easily accessible using the “Train” and “Deploy” drop-down menus on Hugging Face model cards, letting users get started with just a few clicks.

Get started with inference-as-a-service powered by NVIDIA NIM.

Beyond a Token Gesture — NVIDIA NIM Brings Big Benefits

NVIDIA NIM is a collection of AI microservices — including NVIDIA AI foundation models and open-source community models — optimized for inference using industry-standard application programming interfaces, or APIs.

NIM offers users higher efficiency in processing tokens — the units of data used and generated by a language model. The optimized microservices also improve the efficiency of the underlying NVIDIA DGX Cloud infrastructure, which can increase the speed of critical AI applications.

This means developers see faster, more robust results from an AI model accessed as a NIM compared with other versions of the model. The 70-billion-parameter version of Llama 3, for example, delivers up to 5x higher throughput when accessed as a NIM compared with off-the-shelf deployment on NVIDIA H100 Tensor Core GPU-powered systems.

Near-Instant Access to DGX Cloud Provides Accessible AI Acceleration

The NVIDIA DGX Cloud platform is purpose-built for generative AI, offering developers easy access to reliable accelerated computing infrastructure that can help them bring production-ready applications to market faster.

The platform provides scalable GPU resources that support every step of AI development, from prototype to production, without requiring developers to make long-term AI infrastructure commitments.

Hugging Face inference-as-a-service on NVIDIA DGX Cloud powered by NIM microservices offers easy access to compute resources that are optimized for AI deployment, enabling users to experiment with the latest AI models in an enterprise-grade environment.

More on NVIDIA NIM at SIGGRAPH 

At SIGGRAPH, NVIDIA also introduced generative AI models and NIM microservices for the OpenUSD framework to accelerate developers’ abilities to build highly accurate virtual worlds for the next evolution of AI.

To experience more than 100 NVIDIA NIM microservices with applications across industries, visit ai.nvidia.com.

AI Generated Robotic Content

Recent Posts

GenLayer offers novel approach for AI agent transactions: getting multiple LLMs to vote on a suitable contract

GenLayer is betting that AI-driven contracts, enforced on the blockchain, will be the foundation for…

15 mins ago

OPM Watchdog Says Review of DOGE Work Is Underway

The acting inspector general says the Office of Personnel Management is investigating whether any “emerging…

15 mins ago

New technique overcomes spurious correlations problem in AI

AI models often rely on "spurious correlations," making decisions based on unimportant and potentially misleading…

15 mins ago

Surreal September: Celebrating Our Winners and Highlights

Surreal September was more than just a challenge—it was about elevating your AI art skills…

23 hours ago

The great software rewiring: AI isn’t just eating everything; it is everything

Gen AI is not just another technology layer; it has the potential to eat the…

1 day ago

14 Best Tote Bags of 2025, Tested and Reviewed by WIRED

From beach days to board meetings, these top totes are designed to protect your valuables,…

1 day ago