Categories: FAANG

How to Scale Your EMA

*=Equal Contributors
Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule; for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important machine learning tool is the model EMA, a functional copy of a target model whose parameters move towards those of its target model according to an Exponential Moving Average (EMA) at a rate parameterized by a momentum…
AI Generated Robotic Content

Recent Posts

Best Apple Watch Bands of 2026: Nike, Hermés, and More

We’ve been testing bands since the first Apple Watch launched in 2015. From silicone sports…

13 hours ago

Model Drop | ZIT + LTX 2.3 + Music Video | Arca Gidan contest

The idea came from something I'm pretty sure most of us live every single day:…

1 day ago

Sonos Play Review: Performance Meets Convenience

With great sound and versatility, this new speaker may be Sonos’ best.

2 days ago

AI companions can comfort lonely users but may deepen distress over time

AI companions are always available, never judge, never tire and never demand anything in return.…

2 days ago