Feature selection and data sampling methods for learning reputation dimensions: The University of Amsterdam at RepLab 2014

Open Access
Authors
Publication date 2014
Host editors
  • L. Cappellato
  • N. Ferro
  • M. Halvey
  • W. Kraaij
Book title Working Notes for CLEF 2014 Conference
Book subtitle Sheffield, UK, September 15-18, 2014
Series CEUR Workshop Proceedings
Event CLEF 2014 Labs and Workshop
Pages (from-to) 1479-1490
Publisher Aachen: CEUR-WS
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
We report on our participation in the reputation dimension task of the CLEF RepLab 2014 evaluation initiative, i.e., to classify social media updates into eight predefined categories. We address the task by using corpus-based methods to extract textual features from the labeled training data to train two classifiers in a supervised way. We explore three sampling strategies for selecting training examples, and probe their effect on classification performance. We find that all our submitted runs outperform the baseline, and that elaborate feature selection methods coupled with balanced datasets help improve classification accuracy.
Document type Conference contribution
Language English
Published at http://ceur-ws.org/Vol-1180/CLEF2014wn-Rep-GarbaceaEt2014.pdf
Other links http://ceur-ws.org/Vol-1180
Downloads
CLEF2014wn-Rep-GarbaceaEt2014 (Final published version)
Permalink to this page
Back