An Easter Egg Hunting Approach to Test Collection Building in Dynamic Domains
| Authors |
|
|---|---|
| Publication date | 07-06-2016 |
| Host editors |
|
| Book title | Proceedings of the Seventh International Workshop on Evaluating Information Access (EVIA 2016) |
| Book subtitle | a Satellite Workshop of the NTCIR-12 Conference, June 7, 2016 Tokyo Japan |
| ISBN (electronic) |
|
| Event | Seventh International Workshop on Evaluating Information Access |
| Number of pages | 8 |
| Publisher | Tokyo: National Institute of Informatics |
| Organisations |
|
| Abstract |
Test collections for offline evaluation remain crucial for information retrieval research and industrial practice, yet the
classical Sparck Jones and Van Rijsbergen approach to test
collection building based on the pooling of runs on a large
collection is expensive and being pushed beyond its limits
with the ever increasing size and dynamic nature of the collections. We experiment with a novel approach to reusable
test collection building, where we inject judged pages into
an existing corpus, and have systems retrieve pages from
the extended corpus with the aim to create a reusable test
collection. In a metaphorical way, we hide the Easter eggs
for systems to retrieve. Our experiments exploit the unique
setup of the TREC Contextual Suggestion Track, which allowed both submissions from a fixed corpus (ClueWeb12) as
well as from the open web. We conduct an extensive analysis
of the reusability of the test collection based on ClueWeb12,
and find it too low for reliable offline testing. Then, we detail the expansion with judged pages from the open web,
and do extensive analysis on the reusability of the resulting
expanded test collection, and observe a dramatic increase
in reusability. Our approach offers novel and cost effective
ways to build new test collections, and to refresh and update
existing test collections. This explores new ways of effective
maintenance of offline test collections for dynamic domains
such as the web.
|
| Document type | Conference contribution |
| Language | English |
| Related publication | Test Collection Building and Maintenance in Dynamic Domains |
| Published at | http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings12/pdf/evia/01-EVIA2016-HashemiS.pdf |
| Other links | http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings12/EVIA/toc_evia.html |
| Downloads |
01-EVIA2016-HashemiS
(Final published version)
|
| Permalink to this page | |
