Time-Aware Chi-squared for Document Filtering over Time

Open Access
Authors
Publication date 2013
Book title SIGIR 2013 Workshop on Time-aware Information Access: #TAIA2013: August 1, 2013
Event SIGIR 2013 Workshop on Time Aware Information Access (#TAIA2013)
Pages (from-to) [18-21]
Publisher Microsoft Research
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Document filtering over time is applied in tasks such as tracking topics in online news or social media. We consider it a classification task, where topics of interest correspond to classes, and the feature space consists of the words associated to each class. In streaming settings the set of words associated with a concept may change. In this paper we employ a multinomial Naive Bayes classifier and perform periodic feature selection to adapt to evolving topics. We propose two ways of employing Pearson's χ2 test for feature selection and demonstrate their benefit on the TREC KBA 2012 data set. By incorporating a time-dependent function in our equations for χ2 we provide an elegant method for applying different weighting and windowing schemes. Experiments show improvements of our approach over a non-adaptive baseline, in a realistic settings with limited amounts of training data.
Document type Conference contribution
Language English
Published at http://research.microsoft.com/en-us/people/milads/taia2013.proceedings.final.pdf
Downloads
Permalink to this page
Back