Conversations Powered by Cross-Lingual Knowledge

W. Sun; Chuan Meng; Q. Meng; Z. Ren; P. Ren; Z. Chen; M. de Rijke

doi:https://doi.org/10.1145/3404835.3462883

Conversations Powered by Cross-Lingual Knowledge

Authors	W. Sun Chuan Meng Q. Meng Z. Ren P. Ren Z. Chen M. de Rijke
Publication date	2021
Book title	SIGIR '21
Book subtitle	proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval : July 11-15, 2021, virtual event, Canada
ISBN (electronic)	9781450380379
Event	44th International ACM SIGIR Conference on Research and Development in Information Retrieval
Pages (from-to)	1442-1451
Publisher	New York, NY: Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Today's open-domain conversational agents increase the informativeness of generated responses by leveraging external knowledge. Most of the existing approaches work only for scenarios with a massive amount of monolingual knowledge sources. For languages with limited availability of knowledge sources, it is not effective to use knowledge in the same language to generate informative responses. To address this problem, we propose the task of cross-lingual knowledge grounded conversation (CKGC), where we leverage large-scale knowledge sources in another language to generate informative responses. Two main challenges come with the task of cross-lingual knowledge grounded conversation: (1) knowledge selection and response generation in a cross-lingual setting; and (2) the lack of a test dataset for evaluation. To tackle the first challenge, we propose the curriculum self-knowledge distillation (CSKD) scheme, which utilizes a large-scale dialogue corpus in an auxiliary language to improve cross-lingual knowledge selection and knowledge expression in the target language via knowledge distillation. To tackle the second challenge, we collect a cross-lingual knowledge grounded conversation test dataset to facilitate relevant research in the future. Extensive experiments on the newly created dataset verify the effectiveness of our proposed curriculum self-knowledge distillation method for cross-lingual knowledge grounded conversation. In addition, we find that our proposed unsupervised method significantly outperforms the state-of-the-art baselines in cross-lingual knowledge selection.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1145/3404835.3462883
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Conversations Powered by Cross-Lingual Knowledge