Oxford TRECVID 2006 - Notebook paper
| Authors |
|
|---|---|
| Publication date | 2006 |
| Book title | Proceedings of the 4th TRECVID Workshop |
| Publisher | Gaithersburg, USA: NIST |
| Organisations |
|
| Abstract |
The Oxford team participated in the high-level feature extraction and
interactive search tasks. A vision only approach was used for both tasks, with no use of the text or audio information. For the high-level feature extraction task, we used two different approaches, one using sparse and one using dense visual features to learn classifiers for all 39 required concepts, using the training data supplied by MediaMill [Snoek et al. '06] for the 2005 data. In addition, we also used a face specific classifier, with features computed for specific facial parts, to facilitate answering people-dependent queries such as ``government leader''. We submitted 3 different runs for this task. OXVGG_A was the result of using the dense visual features only. OXVGG_OJ was the result of using the sparse visual features for all the concepts, except for "government leader", "face" and "person", where we prepended the results from the face classifier. OXVGG_AOJ was a run where we applied rank fusion to merge the outputs from the sparse and dense methods with weightings tuned to the training data, and also prepended the face results for "face", "person" and "government leader". In general, the sparse features tended to perform best on the more object based concepts, such as "US flag", while the dense features performed slightly better on more scene based concepts, such as "military". Overall, the fused run did the best with a Mean Average (inferred) Precision (MAP) of 0.093, the sparse run came second with a MAP of 0.080, followed by the dense run with a MAP of 0.053. For the interactive search task, we coupled the results generated during the high-level task with methods to facilitate efficient and productive interactive search. Our system allowed for several "expansion" methods based on the sparse and dense features, as well as a novel on the fly face classification system, which coupled a Google Images search with rapid Support Vector Machine (SVM) training and testing to return results containing a particular person within a few minutes. We submitted just one run, OXVGG_TVI, which performed well, winning two categories and coming above the median in 18 out of 24 queries. |
| Document type | Conference contribution |
| Language | English |
| Published at | http://www.robots.ox.ac.uk/~vgg/publications/papers/philbin06.pdf |
| Downloads | |
| Permalink to this page | |