Categories: FAANG

How to Scale Your EMA

*=Equal Contributors
Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule; for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important machine learning tool is the model EMA, a functional copy of a target model whose parameters move towards those of its target model according to an Exponential Moving Average (EMA) at a rate parameterized by a momentum…
AI Generated Robotic Content

Recent Posts

Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30%

Sakana AI's new inference-time scaling technique uses Monte-Carlo Tree Search to orchestrate multiple LLMs to…

24 mins ago

Trump’s Defiance of TikTok Ban Prompted Immunity Promises to 10 Tech Companies

Newly disclosed records show Attorney General Pam Bondi gave cover to not only Apple and…

24 mins ago

Scientists just simulated the “impossible” — fault-tolerant quantum code cracked at last

A multinational team has cracked a long-standing barrier to reliable quantum computing by inventing an…

24 mins ago

Young children outperform state-of-the-art AI in visual object recognition

As artificial intelligence (AI) rapidly grows—a recent UN Trade and Development report projects the global…

24 mins ago

5 Advanced RAG Architectures Beyond Traditional Methods

Retrieval-augmented generation (RAG) has shaken up the world of language models by combining the best…

23 hours ago