The advent of social media has established a symbiotic relationship between social media and online news. This relationship
can be leveraged for tracking news content, and predicting behavior with tangible real-world applications, e.g., online reputation
management, ad pricing, news ranking, and media analysis. In this thesis we focus on tracking news content in social media,
and predicting user behavior.
In the first part, we develop methods for tracking content which build upon, and extend
practices in Information Retrieval. We begin with discovering social media posts that discuss a news article yet they do not
provide a hyperlink to it. Our methods model news articles using several channels of information, either endogenous or exogenous
to the article. These models are then used to query an index of social media posts. During this process we found that the
query models are close in size to the documents to be retrieved, violating a standard assumption of language modeling. We
correct for this discrepancy by introducing two hypergeometric language models for modeling both queries, and documents to
In the second part, we focus on predicting behavior. First we look at predicting listeners’ preference in
spoken user generated content, namely, podcasts. Then, we predict popularity of news articles from several news agents in
terms of the volume of comments they receive. We develop models for predicting the popularity of an article for both before
and after it is published. Finally, we look at a different aspect of news impact: how reading a news article affects future
user browsing behavior. In each setting, we find patterns that characterize the underlying behavior and extract features that
we then use to establish models for predicting online behavior.