Categories: FAANG

How to Scale Your EMA

*=Equal Contributors
Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule; for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important machine learning tool is the model EMA, a functional copy of a target model whose parameters move towards those of its target model according to an Exponential Moving Average (EMA) at a rate parameterized by a momentum…
AI Generated Robotic Content

Recent Posts

Its still nuts to me how realistic AI is getting, incredible i can run it on a RTX2060 and get these results. (Z-image-Turbo)

Every image is made with Z-Image-Turbo (See links for loras and prompts) A few of…

9 hours ago

Best Live-Captioning Smart Glasses (2026), WIRED tested

Can’t hear what they’re saying? Now you can turn on the subtitles for real-life conversations.

10 hours ago

Flux.2-Klein pipeline for real-time webcam stream processing in 30 FPS

I have built a pipeline based on the Flux.2-Klein-4B model that allows processing of a…

1 day ago

Implementing Permission-Gated Tool Calling in Python Agents

AI agents have evolved beyond passive chatbots.

1 day ago

Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling

Overview of adaptive parallel reasoning. What if a reasoning model could decide for itself when…

1 day ago

Scaling ArchUnit with Nebula ArchRules

By John Burns and Emily YuanIntroductionAt Netflix, we operate using a polyrepo strategy with tens of…

1 day ago