Evaluating the Representational Hub of Language and Vision Models

doi:https://doi.org/10.18653/v1/W19-0418

Evaluating the Representational Hub of Language and Vision Models

Authors	R. Shekhar E. Takmaz R. Fernández R. Bernardi
Publication date	2019
Host editors	S. Dobnik S. Chatzikyriakidis V. Demberg
Book title	Proceedings of the 13th International Conference on Computational Semantics - Long Papers
Book subtitle	IWCS 2019 : 23-27 May, 2019, University of Gothenburg, Gothenburg, Sweden
ISBN (electronic)	9781950737192
Event	13th International Conference on Computational Semantics
Pages (from-to)	211-222
Publisher	Stroudsburg, PA: The Association for Computational Linguistics
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC) Faculty of Science (FNWI)
Abstract	The multimodal models used in the emerging field at the intersection of computational linguistics and computer vision implement the bottom-up processing of the “Hub and Spoke” architecture proposed in cognitive science to represent how the brain processes and combines multi-sensory inputs. In particular, the Hub is implemented as a neural network encoder. We investigate the effect on this encoder of various vision-and-language tasks proposed in the literature: visual question answering, visual reference resolution, and visually grounded dialogue. To measure the quality of the representations learned by the encoder, we use two kinds of analyses. First, we evaluate the encoder pre-trained on the different vision-and-language tasks on an existing “diagnostic task” designed to assess multimodal semantic understanding. Second, we carry out a battery of analyses aimed at studying how the encoder merges and exploits the two modalities.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.18653/v1/W19-0418
Downloads	W19-0418 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Evaluating the Representational Hub of Language and Vision Models