Blogger, stick to your story: Modeling topical noise in blogs with coherence measures
| Authors |
|
|---|---|
| Publication date | 2008 |
| Host editors |
|
| Book title | Proceedings of SIGIR 2008 Workshop on Analytics for Noisy Unstructured Text Data (AND 08), July 24, 2008, Singapore |
| ISBN |
|
| Series | ACM International Conference Proceedings Series |
| Event | 2nd Workshop on Analytics for Noisy Unstructured Text Data (AND 2008), Singapore |
| Pages (from-to) | 39-46 |
| Publisher | New York, NY: Association for Computing Machinery |
| Organisations |
|
| Abstract |
Topical noise in blogs arises when bloggers digress from the central topical thrust of their blogs. We introduce a method to explicitly incorporate a model of topical noise into a language modeling approach to the task of blog distillation. Topical noise is integrated into the model using a coherence score, which reflects the tightness of the topical structure of a blog. Tests performed on the TRECBlog06 corpus show that a naive integration of the coherence score as blog prior fails to achieve performance improvements. Instead, we develop a set of more sophisticated models in which the coherence score is weighted by a function of the blog retrieval score. The proposed models help improve effectiveness of our language modeling approach to the blog distillation task.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1145/1390749.1390757 |
| Permalink to this page | |
