Screenshot 2026 02 11 at 90951AM 1024x935 1

NVIDIA Nemotron 3 Nano 30B MoE model is now available in Amazon SageMaker JumpStart

Today we’re excited to announce that the NVIDIA Nemotron 3 Nano 30B model with  3B active parameters is now generally available in the Amazon SageMaker JumpStart model catalog. You can accelerate innovation and deliver tangible business value with Nemotron 3 Nano on Amazon Web Services (AWS) without having to manage model deployment complexities. You can …

Build financial resilience with AI-powered tabletop exercises on Google Cloud

In the financial sector, resilience isn’t optional. Recent cloud outages have shown us exactly how fast critical data can disappear. The risk is amplified by major regulatory drivers like the Digital Operational Resilience Act (DORA), which mandates that financial institutions are ready for any disruption. The recent designation of Google Cloud as a Critical Third-Party …

Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization

Efficient large-scale inference of transformer-based large language models (LLMs) remains a fundamental systems challenge, frequently requiring multi-GPU parallelism to meet stringent latency and throughput targets. Conventional tensor parallelism decomposes matrix operations across devices but introduces substantial inter-GPU synchronization, leading to communication bottlenecks and degraded scalability. We propose the Parallel Track (PT) Transformer, a novel architectural …

ml 19222 2 architecture 1

How Amazon uses Amazon Nova models to automate operational readiness testing for new fulfillment centers

Amazon is a global ecommerce and technology company that operates a vast network of fulfillment centers to store, process, and ship products to customers worldwide. The Amazon Global Engineering Services (GES) team is responsible for facilitating operational readiness across the company’s rapidly expanding network of fulfillment centers. When launching new fulfillment centers, Amazon must verify …

Gemini Enterprise Agent Ready (GEAR) program now available, a new path to building AI agents at scale

Today’s reality is agentic – software that can reason, plan, and act on your behalf to execute complex workflows. To meet this moment, we are excited to open the Gemini Enterprise Agent Ready (GEAR) learning program to everyone. As a new specialized pathway within the Google Developer Program, GEAR empowers developers and pros to build …

ML 20387 image 1

Automated Reasoning checks rewriting chatbot reference implementation

Today, we are publishing a new open source sample chatbot that shows how to use feedback from Automated Reasoning checks to iterate on the generated content, ask clarifying questions, and prove the correctness of an answer. The chatbot implementation also produces an audit log that includes mathematically verifiable explanations for the answer validity and a …

How PARTs Assemble into Wholes: Learning the Relative Composition of Images

The composition of objects and their parts, along with object-object positional relationships, provides a rich source of information for representation learning. Hence, spatial-aware pretext tasks have been actively explored in self-supervised learning. Existing works commonly start from a grid structure, where the goal of the pretext task involves predicting the absolute position index of patches …

jeffzeng

Structured outputs on Amazon Bedrock: Schema-compliant AI responses

Today, we’re announcing structured outputs on Amazon Bedrock—a capability that fundamentally transforms how you can obtain validated JSON responses from foundation models through constrained decoding for schema compliance. This represents a paradigm shift in AI application development. Instead of validating JSON responses and writing fallback logic for when they fail, you can move straight to …

Vertex AI Latency Comparisonpng5wmax 1000x1000 1

How we cut Vertex AI latency by 35% with GKE Inference Gateway

As generative AI moves from experimentation to production, platform engineers face a universal challenge for inference serving: you need low latency, high throughput, and manageable costs.  It is a difficult balance. Traffic patterns vary wildly, from complex coding tasks that require processing huge amounts of data, to quick, chatty conversations that demand instant replies. Standard …

image 1 13

How Associa transforms document classification with the GenAI IDP Accelerator and Amazon Bedrock

This is a guest post co-written with David Meredith and Josh Zacharias from Associa. Associa, North America’s largest community management company, oversees approximately 7.5 million homeowners with 15,000 employees across more than 300 branch offices. The company manages approximately 48 million documents across 26 TB of data, but their existing document management system lacks efficient …