A Corpus for Entity Profiling in Microblog Posts

Open Access
Authors
  • D. Spina
  • E. Meij
  • A. Oghina
  • M.T. Bui
Publication date 2012
Host editors
  • A. Corujo
  • J. Gonzalo
  • E. Meij
  • M. de Rijke
  • I. Chugur
Book title Language Engineering for Online Reputation Management: 26 May 2012: proceedings
Event LREC Workshop on Information Access Technologies for Online Reputation Management
Pages (from-to) 30-34
Publisher Paris: European Language Resources Association (ELRA)
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Microblogs have become an invaluable source of information for the purpose of online reputation management. Streams of microblogs are of great value because of their direct and real-time nature. An emerging problem is to identify not only microblog posts (such as tweets) that are relevant for a given entity, but also the specific aspects that people discuss. Determining such aspects can be non-trivial because of creative language usage, the highly contextualized and informal nature of microblog posts, and the limited length of this form of communication. In this paper we present two manually annotated corpora to evaluate the task of identifying aspects on Twitter, both of them based upon the WePS-3 ORM task dataset and made available online. The first is created using a pooling methodology, for which we have implemented various methods for automatically extracting aspects from tweets that are relevant for an entity. Human assessors have labeled each of the candidates as being relevant. The second corpus is more fine-grained and contains opinion targets. Here, annotators consider individual tweets related to an entity and manually identify whether the tweet is opinionated and, if so, which part of the tweet is subjective and what the target of the sentiment is, if any.
Document type Conference contribution
Language English
Published at http://www.lrec-conf.org/proceedings/lrec2012/workshops/15.LREC%202012%20Online%20Reputation%20Proceedings.pdf
Downloads
Permalink to this page
Back