Categories: FAANG

How to Scale Your EMA

*=Equal Contributors
Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule; for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important machine learning tool is the model EMA, a functional copy of a target model whose parameters move towards those of its target model according to an Exponential Moving Average (EMA) at a rate parameterized by a momentum…
AI Generated Robotic Content

Recent Posts

NeuralCompanion

NeuralCompanion is an open-source, local-first AI companion project for people who like building, experimenting, and…

11 hours ago

Oto Smart Sprinkler Review (2026): Solar-Powered and Simple to Use

The Oto Smart Sprinkler makes it easy to keep your lawn watered—as long as it…

12 hours ago

A lot of major updates on Flux Real-Time pipeline

Hello! Just a week ago I have posted here announce of my real-time streaming pipeline…

1 day ago

Old Oil and Gas Wells Could Find Second Life Producing Clean Energy

States across the US are looking to take major sources of pollution and use them…

1 day ago

It appears that Microsoft uploaded an image model on HuggingFace and then deleted it.

https://x.com/HuggingPapers/status/2055176632491778363 https://huggingface.co/microsoft/Lens https://huggingface.co/microsoft/Lens-Turbo submitted by /u/Total-Resort-3120 [link] [comments]

2 days ago

Restrict access to sensitive documents in your Amazon Quick knowledge bases for Amazon S3

Organizations that must restrict access to sensitive documents increasingly rely on AI-driven search and chat…

2 days ago