OV-VIS: Open-Vocabulary Video Instance Segmentation

Open Access
Authors
  • X. Tang
  • Y. Hu
  • G. Kang
  • W. Xie
  • E. Gavves
Publication date 11-2024
Journal International Journal of Computer Vision
Volume | Issue number 132 | 11
Pages (from-to) 5048-5065
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Conventionally, the goal of Video Instance Segmentation (VIS) is to segment and categorize objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos. To address this limitation, we make the following three contributions. First, we introduce the novel task of Open-Vocabulary Video Instance Segmentation (OV-VIS), which aims to simultaneously segment, track, and classify objects in videos from open-set categories, including novel categories unseen during training. Second, to benchmark OV-VIS, we collect a Large-Vocabulary Video Instance Segmentation dataset (LV-VIS), that contains well-annotated objects from 1196 diverse categories, significantly surpassing the category size of existing datasets by more than an order of magnitude. Third, we propose a transformer-based OV-VIS model, OV2Seg+, which associates per-frame segmentation masks with a memory-induced transformer and clarifies objects in videos with a voting module given language guidance. In addition, to monitor the progress, we set up the evaluation protocols for OV-VIS and propose a set of strong baseline models to facilitate future endeavors. Extensive experiments on LV-VIS and four existing VIS datasets demonstrate the strong zero-shot generalization ability of OV2Seg+. The dataset and code are released here https://github.com/haochenheheda/LVVIS. The competition website is provided here https://www.codabench.org/competitions/1748.
Document type Article
Language English
Published at https://doi.org/10.1007/s11263-024-02076-w
Other links https://github.com/haochenheheda/LVVIS
Downloads
OV-VIS (Final published version)
Permalink to this page
Back