Personalized Retweet Prediction in Twitter

Liangjie Hong, Aziz Doumith, and Brian D. Davison

A more complete version of this work has been published at WSDM 2013.

From the Introduction
In social network services like Twitter, LinkedIn and Facebook, users are informed instantly via rich multimedia content from their social connections. However, when facing a large amount of content from their social connections, users are simply unable to consume them in an effective and efficient way, leading to the problem of information overload. On the other hand, information for a user is usually limited in scope to the user's social connections. Thus, it is difficult for a user to obtain information distributed outside of their circle, even though it might match their interests, leading to a problem of information shortage. In both cases, users may spend a significant amount of time to filter and search relevant information in social media; thus, it is also very important to understand how users interact with these systems. Interactions between users and social media occur through a variety of actions such as posting, re-posting, replying and commenting. Ideally, social media services would be able to filter and recommend content to users based on their history of previous interactions and interests. This area has attracted the attention of academic and industrial research communities.

The task of understanding users' behaviors and their interests has a number of challenges. First, although the number of items (updates, tweets, etc.) generated by users in services is huge, few of them are interacted by users, making the interaction data is sparse. Second, new users and new content items flow into the system continuously. Thus, the 'cold start' problem tends to be severe in these social platforms, compared to traditional information systems. In addition, a tremendous amount of content is rich yet noisy. Simple information retrieval or topical modeling techniques may not be sufficient to capture users' interests.

The problem tackled in this work has strong links to research in recommender systems and collaborative filtering. However, social content systems are much more dynamic than traditional recommender systems: many new items are pushed into the system every second. Therefore, recommender systems should be adjusted to this novel situation. Traditional successful collaborative filtering models are based on latent factor models (LFM), partially due to their superior performance in the Netflix competition. However, the basic assumption for standard LFM is to exploit a user-item interaction matrix and it cannot handle arbitrary features easily. Although some of newly proposed frameworks, based on LFM, can consider features, fundamental modeling assumptions prevent them from handling high-order interaction data (e.g., tensor). In addition, current extensions to LFM that incorporate rich text information are usually cumbersome, requiring complicated inference algorithms that cannot scale to large datasets. Moreover, researchers in collaborative filtering are realizing that pointwise-based measurements may no longer be appropriate, and so a handful of ranking-based metrics are proposed. However, no work to date has compared them systematically on real world datasets.

In this work, we study the problem of modeling users' behaviors by focusing on one particular decision---retweets---in Twitter and try to understand users' interests. Our method can be easily extended to model multiple types of users' decisions as well. We use a state-of-the-art recommendation model, Factorization Machines, to model user decisions and user-generated content simultaneously.

Presented at the Fourth Workshop on Information in Networks (WIN), NYU, NYC, September 2012. However, please cite the more complete version published at WSDM 2013.

Back to Brian Davison's publications


Last modified: 5 December 2012
Brian D. Davison