Intriguing Properties of Hyperbolic Embeddings in Vision-Language Models

S. Ibrahimi; M. Ghadimi Atigh; N. van Noord; P. Mettes; M. Worring

Intriguing Properties of Hyperbolic Embeddings in Vision-Language Models

Authors	S. Ibrahimi M. Ghadimi Atigh N. van Noord P. Mettes M. Worring
Publication date	07-2024
Journal	Transactions on Machine Learning Research
Article number	2113
Volume \| Issue number	2024
Number of pages	22
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Vision-language models have in short time been established as powerful networks, demonstrating strong performance on a wide range of downstream tasks. A key factor behind their success is the learning of a joint embedding space where pairs of images and textual descriptions are contrastively aligned. Recent work has explored the geometry of the joint embedding space, finding that hyperbolic embeddings provide a compelling alternative to the commonly used Euclidean embeddings. Specifically, hyperbolic embeddings yield improved zero-shot generalization, better visual recognition, and more consistent semantic interpretations. In this paper, we conduct a deeper study into the hyperbolic embeddings and find that they open new doors for vision-language models. In particular, we find that hyperbolic vision-language models provide spatial awareness that Euclidean vision-language models lack, are better capable of dealing with ambiguity, and effectively discriminate between distributions. Our findings shed light on the greater potential of hyperbolic embeddings in large-scale settings, reaching beyond conventional down-stream tasks.
Document type	Article
Language	English
Published at	https://openreview.net/forum?id=P5D2gfi4Gg (Final published version)
Other links	https://github.com/saibr/hypvl http://jmlr.org/tmlr/papers/
Downloads	Intriguing Properties of Hyperbolic Embeddings in Vision-Language Models (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Intriguing Properties of Hyperbolic Embeddings in Vision-Language Models