1wVRT7XY4C9bzNHE lk ViA

Powering Multimodal Intelligence for Video Search

Synchronizing the Senses: Powering Multimodal Intelligence for Video Search By: Meenakshi Jindal and Munya Marazanye Today’s filmmakers capture more footage than ever to maximize their creative options, often generating hundreds, if not thousands, of hours of raw material per season or franchise. Extracting the vital moments needed to craft compelling storylines from this sheer volume of …

1 xPxMxF4max 1000x1000 1

Envoy: A future-ready foundation for agentic AI networking

In today’s agentic AI environments, the network has a new set of responsibilities. In a traditional application stack, the network mainly moves requests between services. But as discussed in a recent white paper, Cloud Infrastructure in the Agent-Native Era, in an agentic system the network sits in the middle of model calls, tool invocations, agent-to-agent …

Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment

Despite their sophisticated general-purpose capabilities, Large Language Models (LLMs) often fail to align with diverse individual preferences because standard post-training methods, like Reinforcement Learning with Human Feedback (RLHF), optimize for a single, global objective. While Group Relative Policy Optimization (GRPO) is a widely adopted on-policy reinforcement learning framework, its group-based normalization implicitly assumes that all …

1 MoqLhu01wArcD1n58SZyg

Smarter Live Streaming at Scale: Rolling Out VBR for All Netflix Live Events

By Renata Teixeira, Zhi Li, Reenal Mahajan, and Wei Wei On January 26, 2026, we flipped an important switch for Live at Netflix: all Live events are now encoded using VBR (Variable Bitrate) instead of CBR (Constant Bitrate). It sounds like a small configuration change, but it required us to revisit some of the foundational assumptions …

ml 20566 image 1

Simulate realistic users to evaluate multi-turn AI agents in Strands Evals

Evaluating single-turn agent interactions follows a pattern that most teams understand well. You provide an input, collect the output, and judge the result. Frameworks like Strands Evaluation SDK make this process systematic through evaluators that assess helpfulness, faithfulness, and tool usage. In a previous blog post, we covered how to build comprehensive evaluation suites for …

How Honeylove boosts product quality and service efficiency with BigQuery

Building the perfect bra takes thousands of data points. That’s why Honeylove isn’t just another intimates brand. We’re a technology company that happens to make exceptional bras, tops, shapewear, and bodysuits. Technology shapes everything we do, from how we iterate garments based on customer feedback to how we optimize sizing across those thousands of data …

nishant

Automating competitive price intelligence with Amazon Nova Act

Monitoring competitor prices is essential for ecommerce teams to maintain a market edge. However, many teams remain trapped in manual tracking, wasting hours daily checking individual websites. This inefficient approach delays decision-making, raises operational costs, and risks human errors that result in missed revenue and lost opportunities. Amazon Nova Act is an open-source browser automation …

1 1B5SFVymax 1000x1000 1

Run real-time and async inference on the same infrastructure with GKE Inference Gateway

As AI workloads transition from experimental prototypes to production-grade services, the infrastructure supporting them faces a growing utilization gap. Enterprises today typically face a binary choice: build for high-concurrency, low-latency real-time requests, or optimize for high-throughput, “async” processing. In Kubernetes environments, these requirements are traditionally handled by separate, siloed GPU and TPU accelerator clusters. Real-time …

ProText: A Benchmark Dataset for Measuring (Mis)gendering in Long-Form Texts

We introduce ProText, a dataset for measuring gendering and misgendering in stylistically diverse long-form English texts. ProText spans three dimensions: Theme nouns (names, occupations, titles, kinship terms), Theme category (stereotypically male, stereotypically female, gender-neutral/non-gendered), and Pronoun category (masculine, feminine, gender-neutral, none). The dataset is designed to probe (mis)gendering in text transformations such as summarization and …

ML20076 image 1

Build reliable AI agents with Amazon Bedrock AgentCore Evaluations

Your AI agent worked in the demo, impressed stakeholders, handled test scenarios, and seemed ready for production. Then you deployed it, and the picture changed. Real users experienced wrong tool calls, inconsistent responses, and failure modes nobody anticipated during testing. The result is a gap between expected agent behavior and actual user experience in production. …