Towards Open-Vocabulary Video Instance Segmentation

Open Access
Authors
Publication date 2023
Book title 2023 IEEE/CVF International Conference on Computer Vision
Book subtitle ICCV 2023 : Paris, France, 2-6 October 2023 : proceedings
ISBN
  • 9798350307191
ISBN (electronic)
  • 9798350307184
Event 2023 IEEE/CVF International Conference on Computer Vision (ICCV)
Pages (from-to) 4034-4043
Publisher Los Alamitos, California: IEEE Computer Society
Organisations
  • Faculty of Economics and Business (FEB) - Amsterdam Business School Research Institute (ABS-RI)
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos. To address this limitation, we make the following three contributions. First, we introduce the novel task of Open-Vocabulary Video Instance Segmentation, which aims to simultaneously segment, track, and classify objects in videos from open-set categories, including novel categories unseen during training. Second, to benchmark Open-Vocabulary VIS, we collect a Large-Vocabulary Video Instance Segmentation dataset (LV-VIS), that contains well-annotated objects from 1,196 diverse categories, significantly surpassing the category size of existing datasets by more than one order of magnitude. Third, we propose an efficient Memory-Induced Transformer architecture, OV2Seg, to first achieve Open-Vocabulary VIS in an end-to-end manner with near real-time inference speed. Extensive experiments on LV-VIS and four existing VIS datasets demonstrate the strong zero-shot generalization ability of OV2Seg on novel categories. The dataset and code are released here https://github.com/haochenheheda/LVVIS.
Document type Conference contribution
Note With supplemental file
Language English
Published at https://doi.org/10.48550/arXiv.2304.01715 https://doi.org/10.1109/ICCV51070.2023.00375
Published at https://openaccess.thecvf.com/content/ICCV2023/html/Wang_Towards_Open-Vocabulary_Video_Instance_Segmentation_ICCV_2023_paper.html
Other links https://github.com/haochenheheda/LVVIS https://www.proceedings.com/72328.html
Downloads
Supplementary materials
Permalink to this page
Back