DataComp: In Search of the Next Generation of Multimodal Datasets
*=Equal Contributors Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the ML ecosystem, we introduce DataComp, a testbed for dataset experiments centered around a new candidate pool …
Read more “DataComp: In Search of the Next Generation of Multimodal Datasets”