Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment

Despite their sophisticated general-purpose capabilities, Large Language Models (LLMs) often fail to align with diverse individual preferences because standard post-training methods, like Reinforcement Learning with Human Feedback (RLHF), optimize for a single, global objective. While Group Relative Policy Optimization (GRPO) is a widely adopted on-policy reinforcement learning framework, its group-based normalization implicitly assumes that all …

1 MoqLhu01wArcD1n58SZyg

Smarter Live Streaming at Scale: Rolling Out VBR for All Netflix Live Events

By Renata Teixeira, Zhi Li, Reenal Mahajan, and Wei Wei On January 26, 2026, we flipped an important switch for Live at Netflix: all Live events are now encoded using VBR (Variable Bitrate) instead of CBR (Constant Bitrate). It sounds like a small configuration change, but it required us to revisit some of the foundational assumptions …

ml 20566 image 1

Simulate realistic users to evaluate multi-turn AI agents in Strands Evals

Evaluating single-turn agent interactions follows a pattern that most teams understand well. You provide an input, collect the output, and judge the result. Frameworks like Strands Evaluation SDK make this process systematic through evaluators that assess helpfulness, faithfulness, and tool usage. In a previous blog post, we covered how to build comprehensive evaluation suites for …

How Honeylove boosts product quality and service efficiency with BigQuery

Building the perfect bra takes thousands of data points. That’s why Honeylove isn’t just another intimates brand. We’re a technology company that happens to make exceptional bras, tops, shapewear, and bodysuits. Technology shapes everything we do, from how we iterate garments based on customer feedback to how we optimize sizing across those thousands of data …

Crashing waves vs. rising tides: Overturning prior views about how AI could overtake human workers

Anthropic CEO Dario Amodei has said that AI could surpass “almost all humans at almost everything” shortly after 2027. While AI’s capabilities are certainly improving, such rapid progress might seem at odds with findings that show AI is still failing at 95%+ of remote freelance projects, and continues to struggle with hallucination, long term planning, …

nishant

Automating competitive price intelligence with Amazon Nova Act

Monitoring competitor prices is essential for ecommerce teams to maintain a market edge. However, many teams remain trapped in manual tracking, wasting hours daily checking individual websites. This inefficient approach delays decision-making, raises operational costs, and risks human errors that result in missed revenue and lost opportunities. Amazon Nova Act is an open-source browser automation …

1 1B5SFVymax 1000x1000 1

Run real-time and async inference on the same infrastructure with GKE Inference Gateway

As AI workloads transition from experimental prototypes to production-grade services, the infrastructure supporting them faces a growing utilization gap. Enterprises today typically face a binary choice: build for high-concurrency, low-latency real-time requests, or optimize for high-throughput, “async” processing. In Kubernetes environments, these requirements are traditionally handled by separate, siloed GPU and TPU accelerator clusters. Real-time …