Interpretable Neural Predictions with Differentiable Binary Variables

J. Bastings; W. Aziz; I. Titov

doi:https://doi.org/10.18653/v1/P19-1284

Interpretable Neural Predictions with Differentiable Binary Variables

Authors	J. Bastings W. Aziz I. Titov
Publication date	2019
Host editors	A. Korhonen D. Traum L. Màrquez
Book title	The 57th Annual Meeting of the Association for Computational Linguistics
Book subtitle	ACL 2019 : proceedings of the conference : July 28-August 2, 2019, Florence, Italy
ISBN (electronic)	9781950737482
Event	The 57th Annual Meeting of the Association for Computational Linguistics - ACL 2019
Pages (from-to)	2963-2977
Publisher	Stroudsburg, PA: The Association for Computational Linguistics
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC) Faculty of Science (FNWI)
Abstract	The success of neural networks comes hand in hand with a desire for more interpretability. We focus on text classifiers and make them more interpretable by having them provide a justification–a rationale–for their predictions. We approach this problem by jointly training two neural network models: a latent model that selects a rationale (i.e. a short and informative part of the input text), and a classifier that learns from the words in the rationale alone. Previous work proposed to assign binary latent masks to input positions and to promote short selections via sparsity-inducing penalties such as L0 regularisation. We propose a latent model that mixes discrete and continuous behaviour allowing at the same time for binary selections and gradient-based training without REINFORCE. In our formulation, we can tractably compute the expected value of penalties such as L0, which allows us to directly optimise the model towards a pre-specified text selection rate. We show that our approach is competitive with previous work on rationale extraction, and explore further uses in attention mechanisms.
Document type	Conference contribution
Note	Later version also available.
Language	English
Published at	https://doi.org/10.18653/v1/P19-1284
Other links	https://github.com/bastings/interpretable_predictions
Downloads	P19-1284v2 (Other version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Interpretable Neural Predictions with Differentiable Binary Variables