Categories: FAANG

How to Scale Your EMA

*=Equal Contributors
Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule; for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important machine learning tool is the model EMA, a functional copy of a target model whose parameters move towards those of its target model according to an Exponential Moving Average (EMA) at a rate parameterized by a momentum…
AI Generated Robotic Content

Recent Posts

Z Image Base Knows Things and Can Deliver

Just a few samples from a lora trained using Z image base. First 4 pictures…

16 hours ago

Agent Evaluation: How to Test and Measure Agentic AI Performance

AI agents that use tools, make decisions, and complete multi-step tasks aren't prototypes anymore.

16 hours ago

How Associa transforms document classification with the GenAI IDP Accelerator and Amazon Bedrock

This is a guest post co-written with David Meredith and Josh Zacharias from Associa. Associa,…

16 hours ago

Announcing Claude Opus 4.6 on Vertex AI

At Google Cloud, we’re committed to providing customers with the leading selection of models to…

16 hours ago

Two Titanic Structures Hidden Deep Within the Earth Have Altered the Magnetic Field for Millions of Years

A team of geologists found for the first time evidence linking regions of low seismic…

17 hours ago

AI agents debate more effectively when given personalities and the ability to interrupt

In a typical online meeting, humans don't always wait politely for their turn to speak.…

17 hours ago