- Time-Aware Chi-squared for Document Filtering over Time
- SIGIR 2013 Workshop on Time Aware Information Access (#TAIA2013)
- Book/source title
- SIGIR 2013 Workshop on Time-aware Information Access: #TAIA2013: August 1, 2013
- Pages (from-to)
- Microsoft Research
- Document type
- Conference contribution
- Faculty of Science (FNWI)
- Informatics Institute (IVI)
Document filtering over time is applied in tasks such as tracking topics in online news or social media. We consider it a classification task, where topics of interest correspond to classes, and the feature space consists of the words associated to each class. In streaming settings the set of words associated with a concept may change. In this paper we employ a multinomial Naive Bayes classifier and perform periodic feature selection to adapt to evolving topics. We propose two ways of employing Pearson's χ2 test for feature selection and demonstrate their benefit on the TREC KBA 2012 data set. By incorporating a time-dependent function in our equations for χ2 we provide an elegant method for applying different weighting and windowing schemes. Experiments show improvements of our approach over a non-adaptive baseline, in a realistic settings with limited amounts of training data.
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.