Categories: FAANG

Scaling Laws for Optimal Data Mixtures

Large foundation models are typically trained on data from multiple domains, with the data mixture—the proportion of each domain used—playing a critical role in model performance. The standard approach to selecting this mixture relies on trial and error, which becomes impractical for large-scale pretraining. We propose a systematic method to determine the optimal data mixture for any target domain using scaling laws. Our approach accurately predicts the loss of a model of size N trained with D tokens and a specific domain weight vector h. We validate the universality of these scaling laws by…
AI Generated Robotic Content

Recent Posts

We can finally watch TNG in 16:9

Somone posted an example of LTX 2.3 outpainting to expand 4:3 video to 16:9. I…

8 hours ago

The Complete Guide to Inference Caching in LLMs

Calling a large language model API at scale is expensive and slow.

8 hours ago

The Human Infrastructure: How Netflix Built the Operations Layer Behind Live at Scale

By: Brett Axler, Casper Choffat, and Alo LowryIn the three years since our first Live show,…

8 hours ago

Introducing granular cost attribution for Amazon Bedrock

As AI inference grows into a significant share of cloud spend, understanding who and what…

8 hours ago

OpenAI Executive Kevin Weil Is Leaving the Company

The former Instagram VP is departing the ChatGPT-maker, which is folding the AI science application…

9 hours ago