ML 20156 image 1 scaled 1

Secure short-term GPU capacity for ML workloads with EC2 Capacity Blocks for ML and SageMaker training plans

As companies of various sizes adopt graphic processing units (GPU)-based machine learning (ML) training, fine-tuning and inference workloads, the demand for GPU capacity has outpaced industry-wide supply. This imbalance has made GPUs a scarce resource, creating a challenge for customers who need reliable access to GPU compute resources for their ML workloads. When you encounter …

jetbrains BQMjQD5max 1000x1000 1

Gemini 3.1 Flash-Lite is now generally available on Gemini Enterprise Agent Platform

Today, we’re thrilled to announce that Gemini 3.1 Flash-Lite, our fastest and most cost-efficient Gemini 3 series model yet, is now generally available.  Designed for ultra-low latency, high-volume tasks, and unmatched cost-efficiency, Flash-Lite is already transforming how applications are built at scale. Fast, iterative, and scalable, it joins our comprehensive suite of Pro and Flash …

Inspired by the brain, researchers build smarter and more efficient computer hardware

As traditional computer chips reach their physical limits and artificial intelligence demands more energy than ever, University of Missouri researchers are rethinking how computers work by taking cues from the human brain. The timing is critical. Energy use from AI data centers is projected to double by the end of the decade, raising urgent questions …

SpecMD: A Comprehensive Study on Speculative Expert Prefetching

Mixture-of-Experts (MoE) models enable sparse expert activation, meaning that only a subset of the model’s parameters is used during each inference. However, to translate this sparsity into practical performance, an expert caching mechanism is required. Previous works have proposed hardware-centric caching policies, but how these various caching policies interact with each other and different hardware …

Tomofun Archtecture 1 1024x949 1

Cost effective deployment of vision-language models for pet behavior detection on AWS Inferentia2

Tomofun, the Taiwan-headquartered pet-tech startup behind the Furbo Pet Camera, is redefining how pet owners interact with their pets remotely. Furbo combines smart cameras with AI to detect behaviors such as barking, running, or unusual activity, and alerts owners in real time. At the core of this capability are computer vision and vision-language models that …

2 System diagrammax 1000x1000 1

Pioneering AI-assisted code migration: How Google achieved 6x faster migration from TensorFlow to JAX

AI coding agents are rapidly becoming ubiquitous across the software industry, fundamentally changing how developers write, test, and debug daily code. While these tools excel at localized, self-contained tasks, applying them to massive, systemic codebase migrations requires an entirely new approach. Google is already addressing this challenge by incorporating AI into many migration workflows: x86 …

AI training method helps robots carry lab-learned skills into real-world tasks

Robots are trained for specific tasks, such as cutting, using simulation. However, collecting real-world data is expensive, slow, and sometimes unsafe, particularly for tasks involving physical interaction. A new AI-based method co-developed by Aston University’s Dr. Alireza Rastegarpanah could revolutionize the way advanced robotic systems are trained for real-life tasks, making them more practical and …