How PARTs Assemble into Wholes: Learning the Relative Composition of Images
The composition of objects and their parts, along with object-object positional relationships, provides a rich source of information for representation learning. Hence, spatial-aware pretext tasks have been actively explored in self-supervised learning. Existing works commonly start from a grid structure, where the goal of the pretext task involves predicting the absolute position index of patches …
Read more “How PARTs Assemble into Wholes: Learning the Relative Composition of Images”