M. de Rijke
- Creating the DISEQuA corpus: a multilingual test set for the monolingual question Answering tasks at CLEF 2003
- Book title
- Working notes for the CLEF 2003 Workshop
- Document type
- Faculty of Science (FNWI)
- Informatics Institute (IVI)
This paper describes the procedure adopted by the three co-ordinators of the CLEF 2003 question answering track (ITC-irst, UNED and ILLC) to create the question set for the monolingual tasks. Despite thelittle resources available, the three groups collaborated and managed to formulate and verify a large pool of original questions posed in three different languages: Dutch, Italian and Spanish. A part of these queries was translated into English and shared between the three coordination groups. Thus, a second cross-verification wasconducted, in order to extract the queries that had an answer in all the three monolingual document collections. Finally, the result of the joint efforts was the creation of the DISEQuA (Dutch Italian Spanish English Questions and Answers) corpus, a useful and reusable resource that is freely available for the research community. Thearticle reports on the different stages of the corpus creation, from the monolingual kernels to the multilingual extension.
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.