Contrasting Multiple Representations with the Multi-Marginal Matching Gap
Learning meaningful representations of complex objects that can be seen through multiple (k≥3kgeq 3k≥3) views or modalities is a core task in machine learning. Existing methods use losses originally intended for paired views, and extend them to kkk views, either by instantiating 12k(k−1)tfrac12k(k-1)21k(k−1) loss-pairs, or by using reduced embeddings, following a one vs. average-of-resttextit{one vs. average-of-rest}one vs. average-of-rest strategy. We propose the multi-marginal matching gap (M3G), a loss that borrows tools from multi-marginal optimal transport (MM-OT) theory to…
Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning. Solving it may unlock other problems, such as generalization, interpretability, or fairness. Although remarkably challenging to solve in theory, disentanglement is often achieved in practice through prior matching. Furthermore, recent works have shown that prior matching approaches…
Estimating dimensional emotions, such as activation, valence and dominance, from acoustic speech signals has been widely explored over the past few years. While accurate estimation of activation and dominance from speech seem to be possible, the same for valence remains challenging. Previous research has shown that the use of lexical…
Contrastive learning typically matches pairs of related views among a number of unrelated negative views. Views can be generated (e.g. by augmentations) or be observed. We investigate matching when there are more than two related views which we call poly-view tasks, and derive new representation learning objectives using information maximization…