Categories: FAANG

How to Scale Your EMA

*=Equal Contributors
Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule; for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important machine learning tool is the model EMA, a functional copy of a target model whose parameters move towards those of its target model according to an Exponential Moving Average (EMA) at a rate parameterized by a momentum…
AI Generated Robotic Content

Recent Posts

Had to keep it going

Continuing the music video u/optimisoprimeo posted: https://www.reddit.com/r/StableDiffusion/comments/1t64gni/so_far_this_is_my_favorite_usecase_for_ltx/ submitted by /u/hidden2u [link] [comments]

5 hours ago

What Matters in Practical Learned Image Compression

One of the major differentiators unlocked by learned codecs relative to their hard-coded traditional counterparts…

5 hours ago

Secure short-term GPU capacity for ML workloads with EC2 Capacity Blocks for ML and SageMaker training plans

As companies of various sizes adopt graphic processing units (GPU)-based machine learning (ML) training, fine-tuning…

5 hours ago

Gemini 3.1 Flash-Lite is now generally available on Gemini Enterprise Agent Platform

Today, we’re thrilled to announce that Gemini 3.1 Flash-Lite, our fastest and most cost-efficient Gemini…

5 hours ago

Musk v. Altman Evidence Shows What Microsoft Executives Thought of OpenAI

Leaders at the tech giant were skeptical of OpenAI—but wary of pushing it into the…

6 hours ago