CAMsterdam at SemEval-2019 Task 6 Neural and graph-based feature extraction for the identification of offensive tweets
| Authors |
|
|---|---|
| Publication date | 2019 |
| Host editors |
|
| Book title | The International Workshop on Semantic Evaluation : Proceedings of the Thirteenth Workshop |
| Book subtitle | NAACL HLT 2019 : June 6-June 7, 2019, Minneapolis, Minnesota, USA |
| ISBN (electronic) |
|
| Event | 13th International Workshop on Semantic Evaluation, SemEval 2019, co-located with the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019 |
| Pages (from-to) | 556-563 |
| Number of pages | 8 |
| Publisher | Stroudsburg, PA: Association for Computational Linguistics |
| Organisations |
|
| Abstract |
We describe the CAMsterdam team entry to the SemEval-2019 Shared Task 6 on offensive language identification in Twitter data. Our proposed model learns to extract textual features using a multi-layer recurrent network, and then performs text classification using gradient-boosted decision trees (GBDT). A self-attention architecture enables the model to focus on the most relevant areas in the text. We additionally learn globally optimised embeddings for hashtags using node2vec, which are given as additional tweet features to the GBDT classifier. Our best model obtains 78.79% macro F1-score on detecting offensive language (subtask A), 66.32% on categorising offence types (targeted/untargeted; subtask B), and 55.36% on identifying the target of offence (subtask C). |
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.18653/v1/S19-2100 |
| Other links | https://www.scopus.com/pages/publications/85093413882 |
| Downloads |
S19-2100
(Final published version)
|
| Permalink to this page | |