- Bayesian Dark Knowledge
- Advances in Neural Information Processing Systems
- Pages (from-to)
- Document type
- Faculty of Science (FNWI)
- Informatics Institute (IVI)
We consider the problem of Bayesian parameter estimation for deep neural networks, which is important in problem settings where we may have little data, and/ or where we need accurate posterior predictive densities p(y|x, D), e.g., for applications involving bandits or active learning. One simple approach to this is to use online Monte Carlo methods, such as SGLD (stochastic gradient Langevin dynamics). Unfortunately, such a method needs to store many copies of the parameters (which wastes memory), and needs to make predictions using many versions of the model (which wastes time).We describe a method for "distilling" a Monte Carlo approximation to the posterior predictive density into a more compact form, namely a single deep neural network. We compare to two very recent approaches to Bayesian neural networks, namely an approach based on expectation propagation [HLA15] and an approach based on variational Bayes [BCKW15]. Our method performs better than both of these, is much simpler to implement, and uses less computation at test time.
- Proceedings title: 29th Annual Conference on Neural Information Processing Systems 2015: Montreal, Canada, 7-12 December 2015.
- Volume 4
Publisher: Curran Associates
Place of publication: Red Hook, NY
Editors: C. Cortes, N.D. Lawrence, D.D. Lee, M. Sugiyama, R. Garnett