Time-aware authorship attribution for short text streams

Open Access
Authors
Publication date 2015
Book title SIGIR 2015
Book subtitle proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval: August 9-13, 2015, Santiago, Chile
ISBN (electronic)
  • 9781450336215
Event 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015
Pages (from-to) 727-730
Publisher New York, NY: Association for Computing Machinery
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Identifying authors of short texts on Internet or social media based communication systems is an important tool against fraud and cybercrimes. Besides the challenges raised by the limited length of these short messages, evolving language and writing styles of authors of these texts makes authorship attribution difficult. Most current short text authorship attribution approaches only address the challenge of limited text length. However, neglecting the second challenge may lead to poor performance of authorship attribution for authors who change their writing styles.

In this paper, we analyse the temporal changes of word usage by authors of tweets and emails and based on this analysis we propose an approach to estimate the dynamicity of authors' word usage. The proposed approach is inspired by time-aware language models and can be employed in any time-unaware authorship attribution method. Our experiments on Tweets and the Enron email dataset show that the proposed time-aware authorship attribution approach significantly outperforms baselines that neglect the dynamicity of authors.
Document type Conference contribution
Language English
Published at https://doi.org/10.1145/2766462.2767799
Downloads
p727-azarbonyad (Final published version)
Permalink to this page
Back