Categories: FAANG

How to Scale Your EMA

*=Equal Contributors
Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule; for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important machine learning tool is the model EMA, a functional copy of a target model whose parameters move towards those of its target model according to an Exponential Moving Average (EMA) at a rate parameterized by a momentum…
AI Generated Robotic Content

Recent Posts

Looneytunes background style for ZIT

So, only seven months after the SDXL version, here's a civitai link to the Z-Image…

5 hours ago

Local Mechanisms of Compositional Generalization in Conditional Diffusion

Conditional diffusion models appear capable of compositional generalization, i.e., generating convincing samples for out-of-distribution combinations…

5 hours ago

Connecting Agents to Decisions

The Palantir OntologyPalantir’s software powers real-time, human-agent decision-making in many of the most critical commercial and…

5 hours ago

Migrating a text agent to a voice assistant with Amazon Nova 2 Sonic

Migrating a text agent to a voice assistant is increasingly important because users expect faster,…

5 hours ago

50+ fully managed MCP servers now available for Google Cloud services

At Google Cloud Next ‘26, we announced that more than 50 Google-managed Model Context Protocol…

5 hours ago