Poster Paper (4 pages)
Official AAAI published version: Available here
Author's version: PDF (186KB)
Manufacturers of TV sets have recently started adding social media features to their products. Some of these products dis- play microblogging messages relevant to the TV show which the user is currently watching. However, such systems suffer from low precision and recall when they use the title of the show to search for relevant messages. Titles of some popular shows such as Lost or Survivor are highly ambiguous, result- ing in messages unrelated to the show. Thus, there is a need to develop filtering algorithms that can achieve both high preci- sion and recall. Filtering microblogging messages for Social TV poses several challenges, including lack of training data, lack of proper grammar and capitalization, lack of context due to text sparsity, etc.
We describe a bootstrapping algorithm which uses a small manually labeled dataset, a large dataset of unlabeled mes- sages, and some domain knowledge to derive a high precision classifier that can successfully filter microblogging messages which discuss television shows. The classifier is designed to generalize to TV shows which were not part of the train- ing set. The algorithm achieves high precision on our two test datasets and successfully generalizes to unseen television shows. Furthermore, it compares favorably to a text classifier specifically trained on the television shows used for testing.
In Proceedings of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM), pages 462-465, Barcelona, Spain, July 2011.
© AAAI, 2011. This is the author's version of the work. Not for redistribution.
Back to Brian Davison's publications