A generative blog post retrieval model that uses query expansion based on external collections

Authors
Publication date 2009
Host editors
  • K.-Y. Su
  • J. Su
  • J. Wiebe
  • H. Li
Book title Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore: Volume 2
ISBN
  • 9781932432466
Event Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (ACL-IJCNLP 2009), Suntec, Singapore
Pages (from-to) 1057-1065
Publisher Morristown, NJ: Association for Computational Linguistics (ACL)
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
User generated content is characterized by short, noisy documents, with many spelling errors and unexpected language usage. To bridge the vocabulary gap between the user's information need and documents in a specific user generated content environment, the blogosphere, we apply a form of query expansion, i.e., adding and reweighing query terms. Since the blogosphere is noisy, query expansion on the collection itself is rarely effective but external, edited collections are more suitable. We propose a generative model for expanding queries using external collections in which dependencies between queries, documents, and expansion documents are explicitly modeled. Different instantiations of our model are discussed and make different (in)dependence assumptions. Results using two external collections (news and Wikipedia) show that external expansion for retrieval of user generated content is effective; besides, conditioning the external collection on the query is very beneficial, and making candidate expansion terms dependent on just the document seems sufficient.
Document type Conference contribution
Language English
Published at http://portal.acm.org/citation.cfm?id=1690294
Permalink to this page
Back