Code-switching as a cross-lingual Training Signal: an Example with Unsupervised Bilingual Embedding

Gaschi et al. 2023
Workshop paper - MRL@EMNLP2023

Abstract

Code-switching is the occurrence of words from different languages in the same utterance. This paper shows that code-switching is largely present in a popular dataset for training word embeddings, and demonstrates that it can be a useful training signal for unsupervised cross-lingual embeddings. \ours, the proposed method for leveraging this signal, outperforms other unsupervised mapping-based methods for cross-lingual embeddings on two of the three tested language pairs and suggests that code-switching can be a useful training signal for multilingual representations.

Keywords: code-switching, word embeddings, multilingual alignment, unsupervised mapping