Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks

Open Access
Authors
Publication date 2021
Host editors
  • L. Donatelli
  • N. Krishnaswamy
  • K. Lai
  • J. Pustejovsky
Book title Multimodal Semantic Representations
Book subtitle Proceedings of the First Workshop : IWCS : June 16, 2021
ISBN (electronic)
  • 9781954085213
Event 1st Workshop on Multimodal Semantic Representations
Pages (from-to) 32-44
Publisher Stroudsburg, PA: The Association for Computational Linguistics
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
We investigate the reasoning ability of pretrained vision and language (V&L)models in two tasks that require multimodal integration: (1) discriminating acorrect image-sentence pair from an incorrect one, and (2) counting entities inan image. We evaluate three pretrained V&L models on these tasks: ViLBERT,ViLBERT 12-in-1 and LXMERT, in zero-shot and finetuned settings. Our resultsshow that models solve task (1) very well, as expected, since all models arepretrained on task (1). However, none of the pretrained V&L models is able toadequately solve task (2), our counting probe, and they cannot generalise toout-of-distribution quantities. We propose a number of explanations for thesefindings: LXMERT (and to some extent ViLBERT 12-in-1) show some evidence ofcatastrophic forgetting on task (1). Concerning our results on the countingprobe, we find evidence that all models are impacted by dataset bias, and alsofail to individuate entities in the visual input. While a selling point ofpretrained V&L models is their ability to solve complex tasks, our findingssuggest that understanding their reasoning and grounding capabilities requiresmore targeted investigations on specific phenomena.
Document type Conference contribution
Language English
Published at https://doi.org/10.48550/arXiv.2012.12352
Published at https://aclanthology.org/2021.mmsr-1.4
Downloads
2021.mmsr-1.4 (Final published version)
Permalink to this page
Back