Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents

This paper was accepted at the Fifth Workshop on Natural Language Generation, Evaluation, and Metrics at ACL 2026. Tool-calling agents are evaluated on tool selection, parameter accuracy, and scope recognition, yet LLM trajectory assessments remain inherently post-hoc. Disconnected from the active execution loop, such assessments identify errors that are usually addressed through prompt-tuning or retraining, …

1V0IaBLQSyADbjdnlhRDJdA

State of Routing in Model Serving

By Nipun Kumar, Rajat Shah, Peter Chng Introduction This is the first blog post in a multi-part series that shares technical insights into how our ML model serving infrastructure powers several personalized experiences at scale across various domains (e.g., title recommendations, commerce). In this introductory blog post, we will dive into our domain-independent API abstraction and …

Screenshot 2026 04 30 at 30536PM

AWS Transform now automates BI migration to Amazon Quick in days

Migrating to Amazon Quick doesn’t have to mean starting from scratch. Your dashboards encode hard-won domain knowledge: calculated fields your analysts perfected, layouts your executives rely on every Monday morning, security rules tuned to your org chart. You want AI-powered insights and serverless scale, but you’re staring at hundreds of dashboards and a migration estimate …

A new type of optical chip cuts static power while enabling electrical reprogramming

As technology advances, and the demand for faster, higher-bandwidth, and more energy-efficient data processing continues to grow, scientists and engineers search for ways to improve electronic systems. One avenue they have been exploring is optoelectronics—the study and application of electronic devices that interface with light by detecting, emitting, or converting it into electrical signals.

STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows

Normalizing flows (NFs) are end-to-end likelihood-based generative models for continuous data, and have recently regained attention with encouraging progress on image generation. Yet in the video generation domain, where spatiotemporal complexity and computational cost are substantially higher, state-of-the-art systems almost exclusively rely on diffusion-based models. In this work, we revisit this design space by presenting …

Ready, Set, Build with the NHS Federated Data Platform

The National Health Service (NHS) has delivered universal healthcare to an entire nation for over 75 years. With 1.5 million staff providing care to approximately 57 million patients across hundreds of hospital trusts — using decades old legacy infrastructure — the NHS is one of the most operationally complex organisations on earth. For most of its history, the NHS has …

ML 20696 1

Reinforcement fine-tuning with LLM-as-a-judge

Large language models (LLMs) now drive the most advanced conversational agents, creative tools, and decision-support systems. However, their raw output often contains inaccuracies, policy misalignments, or unhelpful phrasing—issues that undermine trust and limit real-world utility. Reinforcement Fine‑Tuning (RFT) has emerged as the preferred method to align these models efficiently, using automated reward signals to replace …