ai/ml

SpecMD: A Comprehensive Study on Speculative Expert Prefetching

Mixture-of-Experts (MoE) models enable sparse expert activation, meaning that only a subset of the model’s parameters is used during each…

2 months ago

Cost effective deployment of vision-language models for pet behavior detection on AWS Inferentia2

Tomofun, the Taiwan-headquartered pet-tech startup behind the Furbo Pet Camera, is redefining how pet owners interact with their pets remotely.…

2 months ago

Pioneering AI-assisted code migration: How Google achieved 6x faster migration from TensorFlow to JAX

AI coding agents are rapidly becoming ubiquitous across the software industry, fundamentally changing how developers write, test, and debug daily…

2 months ago

Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing

Serving transformer language models with high throughput requires caching Key-Values (KVs) to avoid redundant computation during autoregressive generation. The memory…

2 months ago

How Hapag-Lloyd uses Amazon Bedrock to transform customer feedback into actionable insights

Hapag-Lloyd stands as one of the world’s leading liner shipping companies, operating a modern fleet of 313 container ships with…

2 months ago

Five must-have guides to move agents into production with Gemini Enterprise Agent Platform

Building AI agents that work well in a demo is one thing, but running them in production requires serious infrastructure. …

2 months ago

PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning

Multi-tool-integrated reasoning enables LLM-empowered tool-use agents to solve complex tasks by interleaving natural-language reasoning with calls to external tools. However,…

2 months ago

Democratizing Machine Learning at Netflix: Building the Model Lifecycle Graph

Saish Sali, Nipun Kumar, Sura ElamuruguIntroductionAs Netflix has grown, machine learning continues to support our ability to deliver value to…

2 months ago

Beyond BI: How the Dataset Q&A feature of Amazon Quick powers the next generation of data decisions

Business leaders across industries rely on operational dashboards as the shared source of truth that their teams execute against daily.…

2 months ago

Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents

This paper was accepted at the Fifth Workshop on Natural Language Generation, Evaluation, and Metrics at ACL 2026. Tool-calling agents…

2 months ago