ActionBytes: Learning from Trimmed Videos to Localize Actions
| Authors | |
|---|---|
| Publication date | 2020 |
| Book title | 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition |
| Book subtitle | proceedings : virtual, 14-19 June 2020 |
| ISBN |
|
| ISBN (electronic) |
|
| Series | CVPR |
| Event | 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition |
| Pages (from-to) | 1168-1177 |
| Publisher | Los Alamitos, California: IEEE Computer Society |
| Organisations |
|
| Abstract |
This paper tackles the problem of localizing actions in long untrimmed videos. Different from existing works, which all use annotated untrimmed videos during training, we learn only from short trimmed videos. This enables learning from large-scale datasets originally designed for action classification. We propose a method to train an action localization network that segments a video into interpretable fragments, we call ActionBytes. Our method jointly learns to cluster ActionBytes and trains the localization network using the cluster assignments as pseudo-labels. By doing so, we train on short trimmed videos that become untrimmed for ActionBytes. In isolation, or when merged, the ActionBytes also serve as effective action proposals. Experiments demonstrate that our boundary-guided training generalizes to unknown action classes and localizes actions in long videos of Thumos14, MultiThumos, and ActivityNet1.2. Furthermore, we show the advantage of ActionBytes for zero-shot localization as well as traditional weakly supervised localization, that train on long videos, to achieve state-of-the-art results.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1109/CVPR42600.2020.00125 |
| Downloads |
09157526
(Final published version)
|
| Permalink to this page | |
