Cross-Modal Conceptualization in Bottleneck Models

D. Alukaev; S. Kiselev; I. Pershin; B. Ibragimov; V. Ivanov; A. Kornaev; I. Titov

doi:https://doi.org/10.18653/v1/2023.emnlp-main.318

Cross-Modal Conceptualization in Bottleneck Models

Authors	D. Alukaev S. Kiselev I. Pershin B. Ibragimov V. Ivanov A. Kornaev I. Titov
Publication date	2023
Host editors	H. Bouamor J. Pino K. Bali
Book title	The 2023 Conference on Empirical Methods in Natural Language Processing
Book subtitle	EMNLP 2023 : Proceedings of the Conference : December 6-10, 2023
ISBN (electronic)	9798891760608
Event	2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023
Pages (from-to)	5241-5253
Number of pages	13
Publisher	Stroudsburg, PA: Association for Computational Linguistics
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Concept Bottleneck Models (CBMs) (Koh et al., 2020) assume that training examples (e.g., x-ray images) are annotated with high-level concepts (e.g., types of abnormalities), and perform classification by first predicting the concepts, followed by predicting the label relying on these concepts. The main difficulty in using CBMs comes from having to choose concepts that are predictive of the label and then having to label training examples with these concepts. In our approach, we adopt a more moderate assumption and instead use text descriptions (e.g., radiology reports), accompanying the images in training, to guide the induction of concepts. Our cross-modal approach treats concepts as discrete latent variables and promotes concepts that (1) are predictive of the label, and (2) can be predicted reliably from both the image and text. Through experiments conducted on datasets ranging from synthetic datasets (e.g., synthetic images with generated descriptions) to realistic medical imaging datasets, we demonstrate that cross-modal learning encourages the induction of interpretable concepts while also facilitating disentanglement. Our results also suggest that this guidance leads to increased robustness by suppressing the reliance on shortcut features.
Document type	Conference contribution
Note	With supplementary video
Language	English
Published at	https://doi.org/10.18653/v1/2023.emnlp-main.318 (Final published version)
Other links	https://www.scopus.com/pages/publications/85184816195
Downloads	2023.emnlp-main.318 (Final published version)
Supplementary materials	2023.emnlp-main.318
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Cross-Modal Conceptualization in Bottleneck Models