Using metafeatures to increase the effectiveness of latent semantic models in web search

Open Access
Authors
Publication date 2016
Book title WWW'16
Book subtitle proceedings of the 25th International Conference on World Wide Web : May 11-15, 2016, Montreal, Canada
ISBN
  • 9781450341431
Event WWW 2016: The 25th International Conference on World Wide Web
Pages (from-to) 1081-1091
Publisher New York, NY: Association for Computing Machinery
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
In web search, latent semantic models have been proposed to bridge the lexical gap between queries and documents that is due to the fact that searchers and content creators often use different vocabularies and language styles to express the same concept. Modern search engines simply use the outputs of latent semantic models as features for a so-called global ranker. We argue that this is not optimal, because a single value output by a latent semantic model may be insufficient to describe all aspects of the model's prediction, and thus some information captured by the model is not used effectively by the search engine. To increase the effectiveness of latent semantic models in web search, we propose to create metafeatures-feature vectors that describe the structure of the model's prediction for a given query-document pair and pass them to the global ranker along with the models? scores. We provide simple guidelines to represent the latent semantic model's prediction with more than a single number, and illustrate these guidelines using several latent semantic models. We test the impact of the proposed metafeatures on a web document ranking task using four latent semantic models. Our experiments show that (1) through the use of metafeatures, the performance of each individual latent semantic model can be improved by 10.2% and 4.2% in NDCG scores at truncation levels 1 and 10; and (2) through the use of metafeatures, the performance of a combination of latent semantic models can be improved by 7.6% and 3.8% in NDCG scores at truncation levels 1 and 10, respectively.
Document type Conference contribution
Language English
Published at https://doi.org/10.1145/2872427.2882987
Downloads
borisov-using-2016 (Accepted author manuscript)
Permalink to this page
Back