ActionBytes: Learning from Trimmed Videos to Localize Actions

Open Access
Authors
Publication date 2020
Book title 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Book subtitle proceedings : virtual, 14-19 June 2020
ISBN
  • 9781728171692
ISBN (electronic)
  • 9781728171685
Series CVPR
Event 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Pages (from-to) 1168-1177
Publisher Los Alamitos, California: IEEE Computer Society
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
This paper tackles the problem of localizing actions in long untrimmed videos. Different from existing works, which all use annotated untrimmed videos during training, we learn only from short trimmed videos. This enables learning from large-scale datasets originally designed for action classification. We propose a method to train an action localization network that segments a video into interpretable fragments, we call ActionBytes. Our method jointly learns to cluster ActionBytes and trains the localization network using the cluster assignments as pseudo-labels. By doing so, we train on short trimmed videos that become untrimmed for ActionBytes. In isolation, or when merged, the ActionBytes also serve as effective action proposals. Experiments demonstrate that our boundary-guided training generalizes to unknown action classes and localizes actions in long videos of Thumos14, MultiThumos, and ActivityNet1.2. Furthermore, we show the advantage of ActionBytes for zero-shot localization as well as traditional weakly supervised localization, that train on long videos, to achieve state-of-the-art results.
Document type Conference contribution
Language English
Published at https://doi.org/10.1109/CVPR42600.2020.00125
Downloads
09157526 (Final published version)
Permalink to this page
Back