ml 18665 image 1 1

How PDI built an enterprise-grade RAG system for AI applications with AWS

PDI Technologies is a global leader in the convenience retail and petroleum wholesale industries. They help businesses around the globe increase efficiency and profitability by securely connecting their data and operations. With 40 years of experience, PDI Technologies assists customers in all aspects of their business, from understanding consumer behavior to simplifying technology ecosystems across …

Scaling WideEP Mixture-of-Experts inference with Google Cloud A4X (GB200) and NVIDIA Dynamo

As organizations transition from standard LLMs to massive Mixture-of-Experts (MoE) architectures like DeepSeek-R1, the primary constraint has shifted from raw compute density to communication latency and memory bandwidth. Today, we’re releasing two new validated recipes designed to help customers overcome the infrastructure bottlenecks of the agentic AI era. These new recipes provide clear steps to …

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation

Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models because their denoising models operate over the entire sequence. The global planning and iterative refinement features of dLLMs are particularly useful for code generation. However, current training and inference mechanisms for dLLMs in coding are still under-explored. To demystify the decoding behavior of …

Agentic Orchestrator

How Thomson Reuters built an Agentic Platform Engineering Hub with Amazon Bedrock AgentCore

This post was co-written with Naveen Pollamreddi and Seth Krause from Thomson Reuters. Thomson Reuters (TR) is a leading AI and technology company dedicated to delivering trusted content and workflow automation solutions. With over 150 years of expertise, TR provides essential solutions across legal, tax, accounting, risk, trade, and media sectors in a fast-evolving world. AI …

ML 20078 image 1

Introducing multimodal retrieval for Amazon Bedrock Knowledge Bases

We are excited to announce the general availability of multimodal retrieval for Amazon Bedrock Knowledge Bases. This new capability adds native support for video and audio content, on top of text and images. With it you can build Retrieval Augmented Generation (RAG) applications that can search and retrieve information across text, images, audio, and video—all …

ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models

Recurrent Neural Networks (RNNs) laid the foundation for sequence modeling, but their intrinsic sequential nature restricts parallel computation, creating a fundamental barrier to scaling. This has led to the dominance of parallelizable architectures like Transformers and, more recently, State Space Models (SSMs). While SSMs achieve efficient parallelization through structured linear recurrences, this linearity constraint limits …

ML 198324

Advanced fine-tuning techniques for multi-agent orchestration: Patterns from Amazon at scale

Our work with large enterprise customers and Amazon teams has revealed that high stakes use cases continue to benefit significantly from advanced large language model (LLM) fine-tuning and post-training techniques. In this post, we show you how fine-tuning enabled a 33% reduction in dangerous medication errors (Amazon Pharmacy), engineering 80% human effort reduction (Amazon Global …

Tom Currymax 1000x1000 1

Cloud CISO Perspectives: Practical guidance on building with SAIF

Welcome to the first Cloud CISO Perspectives for January 2026. Today, Tom Curry and Anton Chuvakin, from Google Cloud’s Office of the CISO, share our new report on using Google’s Secure AI Framework with Google Cloud capabilities and services to build boldly and responsibly with AI. As with all Cloud CISO Perspectives, the contents of …

ML 19565 diagram 1png

How the Amazon AMET Payments team accelerates test case generation with Strands Agents

At Amazon.ae, we serve approximately 10 million customers monthly across five countries in the Middle East and North Africa region—United Arab Emirates (UAE), Saudi Arabia, Egypt, Türkiye, and South Africa. Our AMET (Africa, Middle East, and Türkiye) Payments team manages payment selections, transactions, experiences, and affordability features across these diverse countries, publishing on average five …

Introducing BigQuery managed and SQL-native inference for open models

BigQuery provides access to a variety of LLMs for text and embedding generation, including Google’s Gemini models, Google-managed models from partners like Anthropic and Mistral. Using Gemini models and Google-managed partner models in BigQuery is simple — just create the model with the foundation model name and run inference directly in SQL queries. Today, we …