Blogger, stick to your story: Modeling topical noise in blogs with coherence measures

Authors
Publication date 2008
Host editors
  • D. Lopresti
  • S. Roy
  • K. Schulz
  • L.V. Subramaniam
Book title Proceedings of SIGIR 2008 Workshop on Analytics for Noisy Unstructured Text Data (AND 08), July 24, 2008, Singapore
ISBN
  • 9781605581965
Series ACM International Conference Proceedings Series
Event 2nd Workshop on Analytics for Noisy Unstructured Text Data (AND 2008), Singapore
Pages (from-to) 39-46
Publisher New York, NY: Association for Computing Machinery
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Topical noise in blogs arises when bloggers digress from the central topical thrust of their blogs. We introduce a method to explicitly incorporate a model of topical noise into a language modeling approach to the task of blog distillation. Topical noise is integrated into the model using a coherence score, which reflects the tightness of the topical structure of a blog. Tests performed on the TRECBlog06 corpus show that a naive integration of the coherence score as blog prior fails to achieve performance improvements. Instead, we develop a set of more sophisticated models in which the coherence score is weighted by a function of the blog retrieval score. The proposed models help improve effectiveness of our language modeling approach to the blog distillation task.
Document type Conference contribution
Language English
Published at https://doi.org/10.1145/1390749.1390757
Permalink to this page
Back