Term Statistics for Structured Text Retrieval

Authors	J. Kamps M. Lalmas
Publication date	2018
Host editors	L. Liu M.T. Özsu
Book title	Encyclopedia of Database Systems
ISBN	9781461482666
ISBN (electronic)	9781461482659
Edition	2nd
Pages (from-to)	4055-4056
Number of pages	2
Publisher	New York, NY: Springer
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Classical ranking algorithms in information retrieval make use of term statistics, the most common (and basic) ones being within-document term frequency, tf, and document frequency, df. tf is the number of occurrences of a term in a document and is used to reflect how well a term captures the topic of a document, whereas df is the number of documents in which a term appears and is used to reflect how well a term discriminates between relevant and non-relevant documents. df is also commonly referred to as inverse document frequency, idf, since it is inversely related to the importance of a term. Both tf and idf are obtained at indexing time. Ranking algorithms for structured text retrieval, and more precisely XML retrieval, require similar terms statistics, but with respect to elements.
Document type	Entry for encyclopedia/dictionary
Language	English
Published at	https://doi.org/10.1007/978-1-4614-8265-9_412 (Final published version)
Other links	https://www.scopus.com/pages/publications/105012674049
Permalink to this page

Back

UvA-DARE