Categories: FAANG

How to Scale Your EMA

*=Equal Contributors
Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule; for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important machine learning tool is the model EMA, a functional copy of a target model whose parameters move towards those of its target model according to an Exponential Moving Average (EMA) at a rate parameterized by a momentum…
AI Generated Robotic Content

Recent Posts

Can “Safe AI” Companies Survive in an Unrestrained AI Landscape?

TL;DR A conversation with 4o about the potential demise of companies like Anthropic. As artificial…

4 hours ago

Large language overkill: How SLMs can beat their bigger, resource-intensive cousins

Whether a company begins with a proof-of-concept or live deployment, they should start small, test…

5 hours ago

14 Best Planners: Weekly and Daily Notebooks & Accessories (2024)

Digital tools are not always superior. Here are some WIRED-tested agendas and notebooks to keep…

5 hours ago

5 Tools for Visualizing Machine Learning Models

Machine learning (ML) models are built upon data.

1 day ago

AI Systems Governance through the Palantir Platform

Editor’s note: This is the second post in a series that explores a range of…

1 day ago

Introducing Configurable Metaflow

David J. Berg*, David Casler^, Romain Cledat*, Qian Huang*, Rui Lin*, Nissan Pow*, Nurcan Sonmez*,…

1 day ago