- Understanding Content-and-Structure
- Book/source title
- Proceedings INEX 2005 Workshop on Element Retrieval Methodology
- Pages (from-to)
- Document type
- Conference contribution
- Faculty of Science (FNWI)
Interfacultary Research Institutes
- Informatics Institute (IVI)
Institute for Logic, Language and Computation (ILLC)
Document-centric XML is a mixture of text and structure. +With the increased availability of document-centric XML content comes a need for query facilities in which both structural
constraints and constraints on the content of the documents can be expressed. This has generated considerable interest in both the IR and DB communities, and has lead to
the launch of evaluation efforts tailored for XML documents. One of the driving and long-standing research questions here is: How does the increased expressiveness of languages for
querying XML documents help users to better, and more effectively, express their information needs? And closely related to this: How should we evaluate systems that enable users to express their information needs using both content and structural constraints?
In this paper we address these research questions. Our analysis follows two lines: What requirements can in principle be expressed in query languages for document-centric XML documents? And: How do users actually use such languages? For the former, we provide mathematical characterizations of two query languages, one for users with next to no
knowledge of the document structure (ignorant users), and one for users that have some, but not complete, knowledge of the document structure (semi-ignorant users). To address the latter issue, we examine the topics formulated in the second query language as part of the 2004 edition of the INEX XML retrieval initiative. Our main findings are as follows:
First, while structure is used in varying degrees of complexity, over half of the queries can be expressed in the very restrictive ignorant user language. Second, structure is used
as a search hint, and not a search requirement, when judged against the underlying information need. Third, the use of structure in queries functions as a precision device. Fourth,
the underlying retrieval task of content-and-structure querying is no different from the ordinary natural language query retrieval task. From those findings we derive a number of recommendations for the evaluation of systems that cater for content-and-structure queries.
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.