The new AI literacy: Insights from student developers

AI has made it easier than ever for student developers to work efficiently, tackle harder problems, and pursue ambitious projects. But for students earning technical degrees, these new capabilities also create genuine tensions around learning.  How much should I use AI? What should I use it for?  As 90% of technology professionals now use AI …

Exclusive Self Attention

We introduce exclusive self attention (XSA), a simple modification of self attention (SA) that improves Transformer’s sequence modeling performance. The key idea is to constrain attention to capture only information orthogonal to the token’s own value vector (thus excluding information of self position), encouraging better context modeling. Evaluated on the standard language modeling task, XSA …

ml 20634 image 1

Unlocking video insights at scale with Amazon Bedrock multimodal models

Video content is now everywhere, from security surveillance and media production to social platforms and enterprise communications. However, extracting meaningful insights from large volumes of video remains a major challenge. Organizations need solutions that can understand not only what appears in a video, but also the context, narrative, and underlying meaning of the content. In …

DRA Blog Diagrammax 1000x1000 1

DRA: A new era of Kubernetes device management with Dynamic Resource Allocation

The explosion of large language models (LLMs) has increased demand for high-performance accelerators like GPUs and TPUs. As organizations scale their AI capabilities, the scarcity of compute resources is sometimes the primary bottleneck. Efficiently managing every GPU and TPU cycle is no longer just a recommendation — it’s an operational necessity. Kubernetes is becoming the …

AI-powered robot learns how to harvest tomatoes more efficiently

A new tomato-picking robot is learning to think before it acts. Instead of simply identifying ripe fruit, it predicts how easy each tomato will be to harvest and adjusts its approach accordingly. This smarter strategy boosted success rates to 81%, with the robot even switching angles when needed. The breakthrough could pave the way for …

Trained on Tokens, Calibrated on Concepts: The Emergence of Semantic Calibration in LLMs

Large Language Models (LLMs) often lack meaningful confidence estimates for their outputs. While base LLMs are known to exhibit next-token calibration, it remains unclear whether they can assess confidence in the actual meaning of their responses beyond the token level. We find that, when using a certain sampling-based notion of semantic calibration, base LLMs are …

1xH XWy eonIkWdzq Im8rw

Manufacturing with the Connected Edge

Industrial and defense environments generate massive amounts of data that can’t wait for the cloud. Latency is often measured in milliseconds, and resiliency is paramount. A manufacturing plant can’t go down due to flaky Wi-Fi or a public cloud outage. “Traditional” approaches — shipping servers, hiring local IT, bespoke development, managing one-off deployments — simply don’t scale. Critical operations …

Scaling Global Storytelling: Modernizing Localization Analytics at Netflix

Valentin Geffrier, Tanguy Cornuau Each year, we bring the Analytics Engineering community together for an Analytics Summit — a multi-day internal conference to share analytical deliverables across Netflix, discuss analytic practice, and build relationships within the community. This post is one of several topics presented at the Summit highlighting the breadth and impact of Analytics work across different …

ML 20083 image 1

Deploy SageMaker AI inference endpoints with set GPU capacity using training plans

Deploying large language models (LLMs) for inference requires reliable GPU capacity, especially during critical evaluation periods, limited-duration production testing, or burst workloads. Capacity constraints can delay deployments and impact application performance. Customers can use Amazon SageMaker AI training plans to reserve compute capacity for specified time periods. Originally designed for training workloads, training plans now …

1 KwJQrYdmax 1000x1000 1

Kubernetes as AI Infrastructure: Google Cloud, llm-d, and the CNCF

At Google Cloud, serving the massive-scale needs of large foundation model builders and AI-native companies is at the forefront of our AI infrastructure strategy. As generative AI transitions to mission-critical production environments, these innovators require dynamic, relentlessly efficient infrastructure to overcome complex orchestration challenges and power an agentic future. To meet this moment, we are …