Empirical Study of Topic Modeling in Twitter

Liangjie Hong and Brian D. Davison

Full Paper (9 pages)
Official ACM published version: http://dx.doi.org/10.1145/1964858.1964870
Author's version: PDF (158KB)

Abstract

Social networks such as Facebook, LinkedIn, and Twitter have been a crucial source of information for a wide spectrum of users. In Twitter, popular information that is deemed important by the community propagates through the network. Studying the characteristics of content in the messages becomes important for a number of tasks, such as breaking news detection, personalized message recommendation, friends recommendation, sentiment analysis and others. While many researchers wish to use standard text mining tools to understand messages on Twitter, the restricted length of those messages prevents them from being employed to their full potential. We address the problem of using standard topic models in micro-blogging environments by studying how the models can be trained on the dataset. We propose several schemes to train a standard topic model and compare their quality and effectiveness through a set of carefully designed experiments from both qualitative and quantitative perspectives. We show that by training a topic model on aggregated messages we can obtain a higher quality of learned model which results in significantly better performance in two real-world classification problems. We also discuss how the state-of-the-art Author-Topic model fails to model hierarchical relationships between entities in Social Media.

In Proceedings of the First Workshop on Social Media Analytics (SOMA), pages 80-88, Washington, DC, July 2010.

Back to Brian Davison's publications

Last modified: 18 June 2011
Brian D. Davison