Interpretable Neural Predictions with Differentiable Binary Variables
| Authors | |
|---|---|
| Publication date | 2019 |
| Host editors |
|
| Book title | The 57th Annual Meeting of the Association for Computational Linguistics |
| Book subtitle | ACL 2019 : proceedings of the conference : July 28-August 2, 2019, Florence, Italy |
| ISBN (electronic) |
|
| Event | The 57th Annual Meeting of the Association for Computational Linguistics - ACL 2019 |
| Pages (from-to) | 2963-2977 |
| Publisher | Stroudsburg, PA: The Association for Computational Linguistics |
| Organisations |
|
| Abstract |
The success of neural networks comes hand in hand with a desire for more interpretability. We focus on text classifiers and make them more interpretable by having them provide a justification–a rationale–for their predictions. We approach this problem by jointly training two neural network models: a latent model that selects a rationale (i.e. a short and informative part of the input text), and a classifier that learns from the words in the rationale alone. Previous work proposed to assign binary latent masks to input positions and to promote short selections via sparsity-inducing penalties such as L0 regularisation. We propose a latent model that mixes discrete and continuous behaviour allowing at the same time for binary selections and gradient-based training without REINFORCE. In our formulation, we can tractably compute the expected value of penalties such as L0, which allows us to directly optimise the model towards a pre-specified text selection rate. We show that our approach is competitive with previous work on rationale extraction, and explore further uses in attention mechanisms.
|
| Document type | Conference contribution |
| Note | Later version also available. |
| Language | English |
| Published at | https://doi.org/10.18653/v1/P19-1284 |
| Other links | https://github.com/bastings/interpretable_predictions |
| Downloads |
P19-1284v2
(Other version)
|
| Permalink to this page | |
