1RQuXyzwooTZcUZ5rCejOug

Democratizing Machine Learning at Netflix: Building the Model Lifecycle Graph

Saish Sali, Nipun Kumar, Sura Elamurugu Introduction As Netflix has grown, machine learning continues to support our ability to deliver value to members and drive excellence across multiple areas of our business. When Netflix began investing in machine learning over a decade ago, it was primarily focused on a single domain: personalization. Scala was the …

image ml 208081 new

Beyond BI: How the Dataset Q&A feature of Amazon Quick powers the next generation of data decisions

Business leaders across industries rely on operational dashboards as the shared source of truth that their teams execute against daily. But dashboards are built to answer known questions. When teams need to explore further, ad-hoc, multi-dimensional, or unforeseen questions, they hit a bottleneck. They wait hours or days for BI teams to build new views …

Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents

This paper was accepted at the Fifth Workshop on Natural Language Generation, Evaluation, and Metrics at ACL 2026. Tool-calling agents are evaluated on tool selection, parameter accuracy, and scope recognition, yet LLM trajectory assessments remain inherently post-hoc. Disconnected from the active execution loop, such assessments identify errors that are usually addressed through prompt-tuning or retraining, …

1V0IaBLQSyADbjdnlhRDJdA

State of Routing in Model Serving

By Nipun Kumar, Rajat Shah, Peter Chng Introduction This is the first blog post in a multi-part series that shares technical insights into how our ML model serving infrastructure powers several personalized experiences at scale across various domains (e.g., title recommendations, commerce). In this introductory blog post, we will dive into our domain-independent API abstraction and …

Screenshot 2026 04 30 at 30536PM

AWS Transform now automates BI migration to Amazon Quick in days

Migrating to Amazon Quick doesn’t have to mean starting from scratch. Your dashboards encode hard-won domain knowledge: calculated fields your analysts perfected, layouts your executives rely on every Monday morning, security rules tuned to your org chart. You want AI-powered insights and serverless scale, but you’re staring at hundreds of dashboards and a migration estimate …

STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows

Normalizing flows (NFs) are end-to-end likelihood-based generative models for continuous data, and have recently regained attention with encouraging progress on image generation. Yet in the video generation domain, where spatiotemporal complexity and computational cost are substantially higher, state-of-the-art systems almost exclusively rely on diffusion-based models. In this work, we revisit this design space by presenting …

Ready, Set, Build with the NHS Federated Data Platform

The National Health Service (NHS) has delivered universal healthcare to an entire nation for over 75 years. With 1.5 million staff providing care to approximately 57 million patients across hundreds of hospital trusts — using decades old legacy infrastructure — the NHS is one of the most operationally complex organisations on earth. For most of its history, the NHS has …

ML 20696 1

Reinforcement fine-tuning with LLM-as-a-judge

Large language models (LLMs) now drive the most advanced conversational agents, creative tools, and decision-support systems. However, their raw output often contains inaccuracies, policy misalignments, or unhelpful phrasing—issues that undermine trust and limit real-world utility. Reinforcement Fine‑Tuning (RFT) has emerged as the preferred method to align these models efficiently, using automated reward signals to replace …

Francis DeSouza 2026max 1000x1000 1

Cloud CISO Perspectives: At Next ‘26, why we’re multicloud and multi-AI

Welcome to the second Cloud CISO Perspectives for April 2026. Today, Francis deSouza, COO Google Cloud and President, Security Products, explains why Google is multicloud and multi-AI, straight from Next ‘26. As with all Cloud CISO Perspectives, the contents of this newsletter are posted to the Google Cloud blog. If you’re reading this on the …

Adaptive Thinking: Large Language Models Know When to Think in Latent Space

Recent advances in large language models (LLMs) test-time computing have introduced the capability to perform intermediate chain-of-thought (CoT) reasoning (thinking) before generating answers. While increasing the thinking budget yields smooth performance improvements at inference time, the relationship between LLM capability, query complexity, and optimal budget allocation remains poorly understood for achieving compute-optimal inference. To address …