ZS-NMT-Variations, EC40 Multilingual Machine Translation Dataset/Benchmark

Creators
Publication date 2023
Description EC40 is a Multilingual Neural Machine Translation (MNMT) Training Dataset intended to better understand and study MNMT and Zero-Shot NMT. It contains 66 Million English-Centric Sentences covering 40 Languages (excluding English) across 5 Language Families, sampled from OPUS Corpus.
Publisher GitHub
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Document type Dataset
Related publication Towards a Better Understanding of Variations in Zero-Shot Neural Machine Translation Performance
Other links https://github.com/Smu-Tan/ZS-NMT-Variations.git
Permalink to this page
Back