Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity
| Authors | |
|---|---|
| Publication date | 2017 |
| Host editors |
|
| Book title | Advances in Information Retrieval |
| Book subtitle | 39th European Conference on IR Research, ECIR 2017, Aberdeen, UK, April 8–13, 2017 : proceedings |
| ISBN |
|
| ISBN (electronic) |
|
| Series | Lecture Notes in Computer Science |
| Event | 39th European Conference on Information Retrieval (ECIR 2017) |
| Pages (from-to) | 68-81 |
| Publisher | Cham: Springer |
| Organisations |
|
| Abstract |
A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and documents as collections of words. Topic models play a central role in this approach. Using standard topic models for measuring diversity of documents is suboptimal due to generality and impurity. General topics only include common information from a background corpus and are assigned to most of the documents in the collection. Impure topics contain words that are not related to the topic; impurity lowers the interpretability of topic models and impure topics are likely to get assigned to documents erroneously. We propose a hierarchical re-estimation approach for topic models to combat generality and impurity; the proposed approach operates at three levels: words, topics, and documents. Our re-estimation approach for measuring documents’ topical diversity outperforms the state of the art on PubMed dataset which is commonly used for diversity experiments.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1007/978-3-319-56608-5_6 |
| Downloads |
ECIR2017-HiTR
(Accepted author manuscript)
Azarbonyad2017_Chapter_HierarchicalRe-estimationOfTop
(Final published version)
|
| Permalink to this page | |
