Localizing Actions from Video Labels and Pseudo-Annotations

P. Mettes; C.G.M. Snoek; S.-F. Chang

doi:https://doi.org/10.5244/C.31.22

Localizing Actions from Video Labels and Pseudo-Annotations

Authors	P. Mettes C.G.M. Snoek S.-F. Chang
Publication date	2017
Host editors	T.K. Kim S. Zafeiriou G. Brostow K. Mikolajczyk
Book title	Proceedings of the British Machine Vision Conference 2017
ISBN (electronic)	190172560X
Event	28th British Machine Vision Conference
Article number	22
Number of pages	12
Publisher	BMVA Press
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	The goal of this paper is to determine the spatio-temporal location of actions in video. Where training from hard to obtain box annotations is the norm, we propose an intuitive and effective algorithm that localizes actions from their class label only. We are inspired by recent work showing that unsupervised action proposals selected with human point-supervision perform as well as using expensive box annotations. Rather than asking users to provide point supervision, we propose fully automatic visual cues that replace manual point annotations. We call the cues pseudo-annotations, introduce five of them, and propose a correlation metric for automatically selecting and combining them. Thorough evaluation on challenging action localization datasets shows that we reach results comparable to results with full box supervision. We also show that pseudo-annotations can be leveraged during testing to improve weakly- and strongly-supervised localizers.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.5244/C.31.22
Other links	https://ivi.fnwi.uva.nl/isis/publications/2017/MettesBMVC2017
Downloads	mettes-pseudo-annotations-bmvc2017 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Localizing Actions from Video Labels and Pseudo-Annotations