AI-powered robot learns how to harvest tomatoes more efficiently

A new tomato-picking robot is learning to think before it acts. Instead of simply identifying ripe fruit, it predicts how easy each tomato will be to harvest and adjusts its approach accordingly. This smarter strategy boosted success rates to 81%, with the robot even switching angles when needed. The breakthrough could pave the way for …

Trained on Tokens, Calibrated on Concepts: The Emergence of Semantic Calibration in LLMs

Large Language Models (LLMs) often lack meaningful confidence estimates for their outputs. While base LLMs are known to exhibit next-token calibration, it remains unclear whether they can assess confidence in the actual meaning of their responses beyond the token level. We find that, when using a certain sampling-based notion of semantic calibration, base LLMs are …

1xH XWy eonIkWdzq Im8rw

Manufacturing with the Connected Edge

Industrial and defense environments generate massive amounts of data that can’t wait for the cloud. Latency is often measured in milliseconds, and resiliency is paramount. A manufacturing plant can’t go down due to flaky Wi-Fi or a public cloud outage. “Traditional” approaches — shipping servers, hiring local IT, bespoke development, managing one-off deployments — simply don’t scale. Critical operations …

Scaling Global Storytelling: Modernizing Localization Analytics at Netflix

Valentin Geffrier, Tanguy Cornuau Each year, we bring the Analytics Engineering community together for an Analytics Summit — a multi-day internal conference to share analytical deliverables across Netflix, discuss analytic practice, and build relationships within the community. This post is one of several topics presented at the Summit highlighting the breadth and impact of Analytics work across different …

ML 20083 image 1

Deploy SageMaker AI inference endpoints with set GPU capacity using training plans

Deploying large language models (LLMs) for inference requires reliable GPU capacity, especially during critical evaluation periods, limited-duration production testing, or burst workloads. Capacity constraints can delay deployments and impact application performance. Customers can use Amazon SageMaker AI training plans to reserve compute capacity for specified time periods. Originally designed for training workloads, training plans now …

1 KwJQrYdmax 1000x1000 1

Kubernetes as AI Infrastructure: Google Cloud, llm-d, and the CNCF

At Google Cloud, serving the massive-scale needs of large foundation model builders and AI-native companies is at the forefront of our AI infrastructure strategy. As generative AI transitions to mission-critical production environments, these innovators require dynamic, relentlessly efficient infrastructure to overcome complex orchestration challenges and power an agentic future. To meet this moment, we are …

1w8Z7CwTNc4dW84n8 S CMw

Optimizing Recommendation Systems with JDK’s Vector API

By Harshad Sane Ranker is one of the largest and most complex services at Netflix. Among many things, it powers the personalized rows you see on the Netflix homepage, and runs at an enormous scale. When we looked at CPU profiles for this service, one feature kept standing out: video serendipity scoring — the logic that answers a …

ML 20456 image 1 1024x249 1

Building specialized AI without sacrificing intelligence: Nova Forge data mixing in action

Large language models (LLMs) perform well on general tasks but struggle with specialized work that requires understanding proprietary data, internal processes, and industry-specific terminology. Supervised fine-tuning (SFT) adapts LLMs to these organizational contexts. SFT can be implemented through two distinct methodologies: Parameter-Efficient Fine-Tuning (PEFT), which updates only a subset of model parameters, offering faster training …

1 rag gen aimax 1000x1000 1

Designing private network connectivity for RAG-capable gen AI apps

The flexibility of Google Cloud allows enterprises to build secure and reliable architecture for their AI workloads. In this blog we will look at a reference architecture for private connectivity for retrieval-augmented generation (RAG)-capable generative AI applications. This architecture is for scenarios where communications of the overall system must use private IP addresses and must …

Mount Mayhem at Netflix: Scaling Containers on Modern CPUs

Authors: Harshad Sane, Andrew Halaney Imagine this — you click play on Netflix on a Friday night and behind the scenes hundreds of containers spring to action in a few seconds to answer your call. At Netflix, scaling containers efficiently is critical to delivering a seamless streaming experience to millions of members worldwide. To keep up with responsiveness …