Tracking by Natural Language Specification

Open Access
Authors
Publication date 2017
Book title 30th IEEE Conference on Computer Vision and Pattern Recognition
Book subtitle CVPR 2017 : 21-26 July 2016, Honolulu, Hawaii : proceedings
ISBN
  • 9781538604588
ISBN (electronic)
  • 9781538604571
Event 2017 IEEE Conference on Computer Vision and Pattern Recognition
Pages (from-to) 7350-7358
Publisher Piscataway, NJ: IEEE
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
This paper strives to track a target object in a video. Rather than specifying the target in the first frame of a video by a bounding box, we propose to track the object based on a natural language specification of the target, which provides a more natural human-machine interaction as well as a means to improve tracking results. We define three variants of tracking by language specification: one relying on lingual target specification only, one relying on visual target specification based on language, and one leveraging their joint capacity. To show the potential of tracking by natural language specification we extend two popular tracking datasets with lingual descriptions and report experiments. Finally, we also sketch new tracking scenarios in surveillance and other live video streams that become feasible with a lingual specification of the target.
Document type Conference contribution
Language English
Published at https://doi.org/10.1109/CVPR.2017.777
Other links https://ivi.fnwi.uva.nl/isis/publications/2017/LiCVPR2017
Downloads
Li_Tracking_by_Natural_CVPR_2017_paper (Accepted author manuscript)
Tracking by Natural Language Specification (Final published version)
Permalink to this page
Back