Term Statistics for Structured Text Retrieval

Authors
Publication date 2018
Host editors
  • L. Liu
  • M.T. Özsu
Book title Encyclopedia of Database Systems
ISBN
  • 9781461482666
ISBN (electronic)
  • 9781461482659
Edition 2nd
Pages (from-to) 4055-4056
Number of pages 2
Publisher New York, NY: Springer
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
Classical ranking algorithms in information retrieval make use of term statistics, the most common (and basic) ones being within-document term frequency, tf, and document frequency, df. tf is the number of occurrences of a term in a document and is used to reflect how well a term captures the topic of a document, whereas df is the number of documents in which a term appears and is used to reflect how well a term discriminates between relevant and non-relevant documents. df is also commonly referred to as inverse document frequency, idf, since it is inversely related to the importance of a term. Both tf and idf are obtained at indexing time. Ranking algorithms for structured text retrieval, and more precisely XML retrieval, require similar terms statistics, but with respect to elements.
Document type Entry for encyclopedia/dictionary
Language English
Published at https://doi.org/10.1007/978-1-4614-8265-9_412
Other links https://www.scopus.com/pages/publications/105012674049
Permalink to this page
Back