faculty: "FNWI" and publication year: "2010"
| Author||Steven Roebert|
|Title||Recognizing Human Actions in Movies : recognition and localization of realistic human actions|
|Supervisors||Ivo Everts, Theo Gevers|
|Faculty||Faculty of Science|
|Institute/dept.||FNWI: Instituut voor Informatica|
|Programme||FNWI MSc Artificial Intelligence|
|Abstract||Realistic human action recognition has become a popular topic in computer vision. Being different from other types of action recognition, where actions are generally recorded in a controlled
environment, is what makes it a challenging task. One important reason for recent popularity is the release of the Hollywood2 Human Actions (HOHA-2) dataset. This dataset consists of a large amount of action samples obtained from several Hollywood movies.
In this paper we look into the definition of a realistic human action. A distinction is made between three different granularity levels of action recognition in movies: clip, shot and action.
Shots generally are a more natural level than clips, as these only focus on one particular context and object or action. As the original HOHA-2 dataset only contains labels on clip level, we extended the dataset by creating new labels, containing the temporal extent of an action. This makes it possible to generate labels for each level of granularity.
Having these new labels, allows for new recognition experiments on each level of granularity. In our approach to action recognition we use three different types of features. We build upon existing work in using content and context features and we present style features which are particularly focussed on movies. By combining these feature types, we manage to improve the overall recognition performance. Relative performance improvements of up to 10% in terms of
mean average precision are acquired.
The addition of the new labels to the HOHA-2 dataset further allows for a simple form of action localization, in which the extent of an action is automatically segmented. We look at two different approaches to this localization task, conditional random fields (CRF) and an approach derived from dynamic programming. Results in localization are not particularly well. There is a slight favor towards the CRF approach, which gains more stable performance than the dynamic one.|
|Document type|| scriptie master|
Use this url to link to this page: http://dare.uva.nl/en/scriptie/359335
Contact us about this recordNotify a colleague
Add to bookbag