Finding Influential Training Samples for Gradient Boosted Decision Trees

Finding Influential Training Samples for Gradient Boosted Decision Trees

Authors	B. Sharchilev Y. Ustinovsky P. Serdyukov M. de Rijke
Publication date	2018
Journal	Proceedings of Machine Learning Research
Event	35th International Conference on Machine Learning, ICML 2018
Volume \| Issue number	80
Pages (from-to)	4577-4585
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	We address the problem of finding influential training samples for a particular case of tree ensemble-based models, e.g., Random Forest (RF) or Gradient Boosted Decision Trees (GBDT). A natural way of formalizing this problem is studying how the model's predictions change upon leave-one-out retraining, leaving out each individual training sample. Recent work has shown that, for parametric models, this analysis can be conducted in a computationally efficient way. We propose several ways of extending this framework to non-parametric GBDT ensembles under the assumption that tree structures remain fixed. Furthermore, we introduce a general scheme of obtaining further approximations to our method that balance the trade-off between performance and computational complexity. We evaluate our approaches on various experimental setups and use-case scenarios and demonstrate both the quality of our approach to finding influential training samples in comparison to the baselines and its computational efficiency.
Document type	Article
Note	International Conference on Machine Learning, 10-15 July 2018, Stockholmsmässan, Stockholm Sweden. - With supplementary file. - In print proceedings pp. 7287-7296.
Language	English
Published at	http://proceedings.mlr.press/v80/sharchilev18a.html
Other links	http://www.proceedings.com/40527.html https://www.scopus.com/pages/publications/85057338736
Downloads	sharchilev18a (Final published version)
Supplementary materials	sharchilev18a-supp
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Finding Influential Training Samples for Gradient Boosted Decision Trees