Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Input Representations Matter
Cross-lingual transfer is a popular approach to increase the amount of training data for NLP tasks in a low-resource context. However, the best strategy to decide which cross-lingual data to include is unclear. Prior research often focuses on a small set of languages from a few language families or a single task. It is still an open question how these findings extend to a wider variety of languages and tasks. In this work, we contribute to this question by analyzing cross-lingual transfer for 263 languages from a wide variety of language families. Moreover, we include three popular NLP tasks…
In this work, we analyze a pre-trained mT5 to discover the attributes of cross-lingual connections learned by this model. Through a statistical interpretation framework over 90 language pairs across three tasks, we show that transfer performance can be modeled by a few linguistic and data-derived features. These observations enable us…
This article introduces contrastive alignment instructions (AlignInstruct) to address two challenges in machine translation (MT) on large language models (LLMs). One is the expansion of supported languages to previously unseen ones. The second relates to the lack of data in low-resource languages. Model fine-tuning through MT instructions (MTInstruct) is a…