- The impact of collection size on relevance and diversity
- 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2010), Geneva, Switzerland
- Book/source title
- SIGIR 2010: proceedings: 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval: Geneva, Switzerland, July 19-23, 2010
- Pages (from-to)
- New York, NY: Association for Computing Machinery
- Document type
- Conference contribution
- Interfacultary Research Institutes
- Institute for Logic, Language and Computation (ILLC)
It has been observed that precision increases with collection size. One explanation could be that the redundancy of information increases, making it easier to find multiple documents conveying the same information. Arguably, a user has no interest in reading the same information over and over, but would prefer a set of diverse search results covering multiple aspects of the search topic. In this paper, we look at the impact of the collection size on the relevance and diversity of retrieval results by down-sampling the collection.
Our main finding is that we can we can improve diversity by randomly removing the majority of the results--this will significantly reduce the redundancy and only marginally affect the subtopic coverage.
- go to publisher's site
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.