Dense Retrieval with Entity Views
| Authors |
|
|---|---|
| Publication date | 2022 |
| Book title | CIKM '22 |
| Book subtitle | proceedings of the 31st ACM International Conference on Information & Knowledge Management : October 17-21, 2022, Atlanta, GA, USA |
| ISBN (electronic) |
|
| Event | 31st ACM International Conference on Information and Knowledge Management, CIKM 2022 |
| Pages (from-to) | 1955–1964 |
| Publisher | New York, NY: The Association for Computing Machinery |
| Organisations |
|
| Abstract |
Pre-trained language models like BERT have been demonstrated to be both effective and efficient ranking methods when combined with approximate nearest neighbor search, which can quickly match dense representations of queries and documents. However, pretrained language models alone do not fully capture information about uncommon entities. In this work, we investigate methods for enriching dense query and document representations with entity information from an external source. Our proposed method identifies groups of entities in a text and encodes them into a dense vector representation, which is then used to enrich BERT's vector representation of the text. To handle documents that contain many loosely-related entities, we devise a strategy for creating multiple entity representations that reflect different views of a document. For example, a document about a scientist may cover aspects of her personal life and recent work, which correspond to different views of the entity. In an evaluation on MS MARCO benchmarks, we find that enriching query and document representations in this way yields substantial increases in effectiveness.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1145/3511808.3557285 |
| Downloads |
3511808.3557285
(Final published version)
|
| Permalink to this page | |