Latent Variable Model for Multi-modal Translation

I. Calixto; M. Rios; W. Aziz

doi:https://doi.org/10.18653/v1/P19-1642

Latent Variable Model for Multi-modal Translation

Authors	I. Calixto M. Rios W. Aziz
Publication date	2019
Host editors	A. Korhonen D. Traum L. Màrquez
Book title	The 57th Annual Meeting of the Association for Computational Linguistics
Book subtitle	ACL 2019 : proceedings of the conference : July 28-August 2, 2019, Florence, Italy
ISBN (electronic)	9781950737482
Event	The 57th Annual Meeting of the Association for Computational Linguistics - ACL 2019
Pages (from-to)	6392–6405
Publisher	Stroudsburg, PA: The Association for Computational Linguistics
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	In this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model. This latent variable can be seen as a multi-modal stochastic embedding of an image and its description in a foreign language. It is used in a target-language decoder and also to predict image features. Importantly, our model formulation utilises visual and textual inputs during training but does not require that images be available at test time. We show that our latent variable MMT formulation improves considerably over strong baselines, including a multi-task learning approach (Elliott and Kadar, 2017) and a conditional variational auto-encoder approach (Toyama et al., 2016). Finally, we show improvements due to (i) predicting image features in addition to only conditioning on them, (ii) imposing a constraint on the KL term to promote models with non-negligible mutual information between inputs and latent variable, and (iii) by training on additional target-language image descriptions (i.e. synthetic data).
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.18653/v1/P19-1642
Other links	https://github.com/iacercalixto/variational_mmt
Downloads	P19-1642 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Latent Variable Model for Multi-modal Translation