Examining the Tip of the Iceberg: A Data Set for Idiom Translation

M. Fadaee; A. Bisazza; C. Monz

Examining the Tip of the Iceberg: A Data Set for Idiom Translation

Authors	M. Fadaee A. Bisazza C. Monz
Publication date	2018
Host editors	N. Calzolari K. Choukri C. Cieri T. Declerck S. Goggi K. Hasida H. Isahara B. Maegaard J. Mariani H. Mazo A. Moreno J. Odijk S. Piperidis T. Tokunaga
Book title	LREC 2018 : Eleventh International Conference on Language Resources and Evaluation
Book subtitle	May 7-12, 2018, Miyazaki, Japan
ISBN (electronic)	9791095546009
Event	11th Language Resources and Evaluation Conference
Pages (from-to)	925-929
Publisher	Paris: European Language Resources Association (ELRA)
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Neural Machine Translation (NMT) has been widely used in recent years with significant improvements for many language pairs. Although state-of-the-art NMT systems are generating progressively better translations, idiom translation remains one of the open challenges in this field. Idioms, a category of multiword expressions, are an interesting language phenomenon where the overall meaning of the expression cannot be composed from the meanings of its parts. A first important challenge is the lack of dedicated data sets for learning and evaluating idiom translation. In this paper we address this problem by creating the first large-scale data set for idiom translation. Our data set is automatically extracted from a widely used German$English translation corpus and includes, for each language direction, a targeted evaluation set where all sentences contain idioms and a regular training corpus where sentences including idioms are marked. We release this data set and use it to perform preliminary NMT experiments as the first step towards better idiom translation.
Document type	Conference contribution
Language	English
Published at	http://www.lrec-conf.org/proceedings/lrec2018/summaries/432.html (Final published version)
Other links	http://www.lrec-conf.org/proceedings/lrec2018/index.html
Downloads	432 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Examining the Tip of the Iceberg: A Data Set for Idiom Translation