Linguistic issues behind visual question answering
| Authors |
|
|---|---|
| Publication date | 06-2021 |
| Journal | Language and Linguistics Compass |
| Article number | e12417 |
| Volume | Issue number | 15 | 6 |
| Number of pages | 25 |
| Organisations |
|
| Abstract |
Answering a question that is grounded in an image is a crucial
ability that requires understanding the question, the visual context,
and their interaction at many linguistic levels: among others,
semantics, syntax and pragmatics. As such, visually-grounded questions
have long been of interest to theoretical linguists and cognitive
scientists. Moreover, they have inspired the first attempts to
computationally model natural language understanding, where pioneering
systems were faced with the highly challenging task—still unsolved—of
jointly dealing with syntax, semantics and inference whilst
understanding a visual context. Boosted by impressive advancements in
machine learning, the task of answering visually-grounded questions has
experienced a renewed interest in recent years, to the point of becoming
a research sub-field at the intersection of computational linguistics
and computer vision. In this paper, we review current approaches to the
problem which encompass the development of datasets, models and
frameworks. We conduct our investigation from the perspective of the
theoretical linguists; we extract from pioneering computational
linguistic work a list of desiderata that we use to review
current computational achievements. We acknowledge that impressive
progress has been made to reconcile the engineering with the theoretical
view. At the same time, we claim that further research is needed to get
to a unified approach which jointly encompasses all the underlying
linguistic problems. We conclude the paper by sharing our own desiderata
for the future.
|
| Document type | Article |
| Language | English |
| Published at | https://doi.org/10.1111/lnc3.12417 |
| Downloads |
Language and Linguist Compass - 2021 - Bernardi - Linguistic issues behind visual question answering
(Final published version)
|
| Permalink to this page | |
