Distillation vs. Sampling for Efficient Training of Learning to Rank Models
| Authors | |
|---|---|
| Publication date | 2024 |
| Book title | ICTIR '24 |
| Book subtitle | Proceedings of the 2024 ACM SIGIR International Conference on the Theory of Information Retrieval : July 13, 2024 Washington, DC, USA |
| ISBN (electronic) |
|
| Event | 14th International Conference on the Theory of Information Retrieval |
| Pages (from-to) | 51-60 |
| Number of pages | 10 |
| Publisher | New York, New York: The Association for Computing Machinery |
| Organisations |
|
| Abstract |
In real-world search settings, learning to rank
(LtR) models are trained and tuned repeatedly using large amounts of
data, thus consuming significant time and computing resources, and
raising efficiency and sustainability concerns. One way to address these
concerns is to reduce the size of training datasets. Dataset sampling
and distillation are two classes of method introduced to enable a
significant reduction in dataset size, while achieving comparable
performance to training with complete data. In this work, we perform a comparative analysis of dataset distillation and sampling methods in the context of LtR. We evaluate gradient matching and distribution matching
dataset distillation approaches -- shown to be effective in computer
vision -- and show how these algorithms can be adjusted for the LtR
task. Our empirical analysis, using three LtR datasets, indicates that,
in contrast to previous studies in computer vision, the selected
distillation methods do not outperform random sampling. Our code and
experimental settings are released alongside the paper. |
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1145/3664190.3672527 |
| Downloads |
3664190.3672527
(Final published version)
|
| Permalink to this page | |
