Improving Word Embedding Compositionality using Lexicographic Definitions

Open Access
Authors
Publication date 2018
Book title The Web Conference 2018
Book subtitle companion of the World Wide Web Conference WWW2018 : April 23-27, 2018, Lyon, France
ISBN (electronic)
  • 9781450356404
Event World Wide Web Conference WWW2018
Pages (from-to) 1083-1093
Publisher [Geneva]: International World Wide Web Conferences Steering Committee
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
  • Faculty of Science (FNWI)
Abstract
We present an in-depth analysis of four popular word embeddings (Word2Vec, GloVe, fastText and Paragram) in terms of their semantic compositionality. In addition, we propose a method to tune these embeddings towards better compositionality. We find that training the existing embeddings to compose lexicographic definitions improves their performance in this task significantly, while also getting similar or better performance in both word similarity and sentence embedding evaluations.

Our method tunes word embeddings using a simple neural network architecture with definitions and lemmas from WordNet. Since dictionary definitions are semantically similar to their associated lemmas, they are the ideal candidate for our tuning method, as well as evaluating for compositionality. Our architecture allows for the embeddings to be composed using simple arithmetic operations, which makes these embeddings specifically suitable for production applications such as web search and data mining.
We also explore more elaborate and involved compositional models.

In our analysis, we evaluate original embeddings, as well as tuned embeddings, using existing word similarity and sentence embedding evaluation methods. Aside from these evaluation methods used in related work, we also evaluate embeddings using a ranking method which tests composed vectors using the lexicographic definitions already mentioned. In contrast to other evaluation methods, ours is not invariant to the magnitude of the embedding vector-which we show is important for composition. We consider this new evaluation method (CompVecEval) to be a key contribution.
Document type Conference contribution
Language English
Published at https://doi.org/10.1145/3178876.3186007
Downloads
p1083-scheepers (Final published version)
Permalink to this page
Back