Term Statistics for Structured Text Retrieval
| Authors | |
|---|---|
| Publication date | 2018 |
| Host editors |
|
| Book title | Encyclopedia of Database Systems |
| ISBN |
|
| ISBN (electronic) |
|
| Edition | 2nd |
| Pages (from-to) | 4055-4056 |
| Number of pages | 2 |
| Publisher | New York, NY: Springer |
| Organisations |
|
| Abstract |
Classical ranking algorithms in information retrieval make use of term
statistics, the most common (and basic) ones being within-document term
frequency, tf, and document frequency, df. tf is
the number of occurrences of a term in a document and is used to reflect
how well a term captures the topic of a document, whereas df is
the number of documents in which a term appears and is used to reflect
how well a term discriminates between relevant and non-relevant
documents. df is also commonly referred to as inverse document frequency, idf, since it is inversely related to the importance of a term. Both tf and idf
are obtained at indexing time. Ranking algorithms for structured text
retrieval, and more precisely XML retrieval, require similar terms
statistics, but with respect to elements.
|
| Document type | Entry for encyclopedia/dictionary |
| Language | English |
| Published at | https://doi.org/10.1007/978-1-4614-8265-9_412 |
| Other links | https://www.scopus.com/pages/publications/105012674049 |
| Permalink to this page | |
