FAANG

Cost effective deployment of vision-language models for pet behavior detection on AWS Inferentia2

Tomofun, the Taiwan-headquartered pet-tech startup behind the Furbo Pet Camera, is redefining how pet owners interact with their pets remotely.…

1 month ago

Pioneering AI-assisted code migration: How Google achieved 6x faster migration from TensorFlow to JAX

AI coding agents are rapidly becoming ubiquitous across the software industry, fundamentally changing how developers write, test, and debug daily…

1 month ago

Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing

Serving transformer language models with high throughput requires caching Key-Values (KVs) to avoid redundant computation during autoregressive generation. The memory…

1 month ago

How Hapag-Lloyd uses Amazon Bedrock to transform customer feedback into actionable insights

Hapag-Lloyd stands as one of the world’s leading liner shipping companies, operating a modern fleet of 313 container ships with…

1 month ago

Five must-have guides to move agents into production with Gemini Enterprise Agent Platform

Building AI agents that work well in a demo is one thing, but running them in production requires serious infrastructure. …

1 month ago

PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning

Multi-tool-integrated reasoning enables LLM-empowered tool-use agents to solve complex tasks by interleaving natural-language reasoning with calls to external tools. However,…

1 month ago

Democratizing Machine Learning at Netflix: Building the Model Lifecycle Graph

Saish Sali, Nipun Kumar, Sura ElamuruguIntroductionAs Netflix has grown, machine learning continues to support our ability to deliver value to…

1 month ago

Beyond BI: How the Dataset Q&A feature of Amazon Quick powers the next generation of data decisions

Business leaders across industries rely on operational dashboards as the shared source of truth that their teams execute against daily.…

1 month ago

Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents

This paper was accepted at the Fifth Workshop on Natural Language Generation, Evaluation, and Metrics at ACL 2026. Tool-calling agents…

1 month ago

State of Routing in Model Serving

By Nipun Kumar, Rajat Shah, Peter ChngIntroductionThis is the first blog post in a multi-part series that shares technical insights into…

1 month ago