Indexing Units of Structured Text Retrieval

Authors
Publication date 2018
Host editors
  • L. Liu
  • M.T. Özsu
Book title Encyclopedia of Database Systems
ISBN
  • 9781461482666
ISBN (electronic)
  • 9781461482659
Edition 2nd
Pages (from-to) 1907-1911
Number of pages 5
Publisher New York, NY: Springer
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract

Indexing units refers to the granularity of information in the retrieval system’s index, which can be in principle any document part of a structured text, and as a consequence determines the possible units of retrieval. There are three basic approaches. The first approach is to index every potentially retrievable unit as a whole – the so-called element-based approach. The second approach is to index disjoint nodes – and relying on aggregation or score propagation methods for scoring higher-level nodes. The third approach is to index only selected elements, for example by indexing particular element types in separate indexes. Various mixtures of these approaches have also been applied.

All approaches make implicit or explicit assumptions on the (most likely) unit of retrieval. Although there may be no designated retrieval unit (such as the document or root node of the structured document), this also does not mean that every document part (such as a sub-tree of the structured document) is an equally desirable retrieval unit. Such assumptions may be relatively generic (such as paragraphs and sections being more informative than very short excerpts in bold or italics) or may depend on the query at hand (such as a structured query requesting elements with a particular tag). In all cases these assumptions depend on the sort of structured documents (which may range from strict XML databases to loosely structured textual documents with mark-up), and on the sort of information need (which may range from a strict database query with well defined semantics to a vague information retrieval topic of request). Structured text retrieval typically deals with loosely structured textual documents and vague information retrieval queries.

Document type Entry for encyclopedia/dictionary
Language English
Published at https://doi.org/10.1007/978-1-4614-8265-9_202
Other links https://www.scopus.com/pages/publications/105012687389
Permalink to this page
Back