Twitter hashtags: joint translation and clustering

Authors
Publication date 2011
Book title Web Science 2011
Publisher ACM
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
The popularity of microblogging platforms, such as Twitter, renders
them valuable real-time information resources for tracking various
aspects of worldwide events, e.g., earthquakes, political elections,
etc. Such events are usually characterized in microblog posts
via the use of hashtags (#). As microbloggers come from different
backgrounds, and express themselves in different languages,
we witness different "translations" of hashtags which, however, are
about the same event. Language-dependent variants of hashtags
can possibly lead to issues in content-analysis. In this paper, we
propose a method for translating hashtags, which builds on methods
from information retrieval. The method introduced is source
and target language independent. Our method is desirable, either
instead of, or complimentary, to the direct translation of the hashtag
for three reasons. First we return a list of hashtags on the
same topic, which takes into account the plurality and variability
of hashtags used by microbloggers for assigning posts to a topic.
Second, our framework accounts for the problem that microbloggers
in different languages will refer to the same topic using different
tokens. Finally, our method does not require special preprocessing
of hashtags, reducing barriers to real-world implementation.
We present proof-of-concept results for the given Spanish
hashtag #33mineros.
Document type Conference contribution
Language English
Published at http://www.websci11.org/fileadmin/websci/Posters/125_paper.pdf
Permalink to this page
Back