Categories: FAANG

How to Scale Your EMA

*=Equal Contributors
Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule; for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important machine learning tool is the model EMA, a functional copy of a target model whose parameters move towards those of its target model according to an Exponential Moving Average (EMA) at a rate parameterized by a momentum…
AI Generated Robotic Content

Recent Posts

Python Concepts Every AI Engineer Must Master

Transitioning from writing local experimental scripts to building scalable, production-grade AI systems requires a shift…

8 hours ago

Building Supercharger: How Rocket Close optimized title operations with agentic AI

Rocket Close is a Detroit-based title agency and appraisal management company within Rocket Companies that…

8 hours ago

Introducing the Open Knowledge Format

As foundation models continue to improve, the lack of relevant context often limits what they…

8 hours ago

Meta Employees Absolutely Hate Mark Zuckerberg’s Plan for a Companywide AI Hackathon

“I’m not sure that this company supports a hackathon culture anymore,” one employee posted in…

9 hours ago

Brain-inspired chip runs near absolute zero and could transform quantum computing

Scientists at the University of Hong Kong have created a remarkable new type of brain-inspired…

9 hours ago

Human understanding of AI can’t keep up with its advancement, researchers say

In a recent editorial published in Science, Microsoft's chief scientific officer, Eric Horvitz, and researcher…

9 hours ago