Research Areas

All publications can be found here and Google Scholar.
Opinion spam detection
Online medias, such as Yelp, TripAdvisor and Amazon, are full of opinionated information that can easily and significantly influence a large number of customers' decisions. Due to the power of the reviews, dishonest businesses have adopted unethical or even illegal marketing strategies by paying a large number of reviewers (spammers) to post fake reviews (opinion spams) to promote or demote the businesses on these medias. We've adopted network and time series based methods to address the issue [ ICDM2011 KDD2012 ], especially with the help of multi-modal review data [ BigData2016b BigData2015 ].
Ensemble and model fusion.
Ensemble of multiple models, if fused properly, can provide more predictive power than any constituent model. Traditional ensemble methods, with a long history, studied the fusion of a small number of predictive models for binary and multi-class prediction. We move this field forward by targeting at fusing many unidentifiable predictive models, such as crowdsourcing workers, with sparse and structured output such as sets, rankings and trees. The challenges are to gauge the individual model's performance and to take into account the extra knowldge of the output space. Please check out these three papers [ ICDM2013 DSAA2015 CIKM2016a ] along with others [ SDM2012 KDD2014 SDM2015b ] .
Extreme Multi-labeled Learning.
Multi-labeled learning is a technique to assign more than one semantic concepts to a data item, and has found wide applications in areas such as bioinformatics, healthcare, e-commerce and social media. Big data have changed the landscape of multi-labeled learning by increasing the number of labels to an unprecedented scale. For example, there are tens of thousands of tags (as labels) for texts on Stackoverflow and Yahoo Answer, and millions of tags for images on Flickr. Extreme multi-labeled learning tries to scale up the traditional multi-labeled learning to handle the large number of labels with varying importance, sparsity and informativeness. My current research tries to address the scalability challenges in many aspects through the help of NLP, knowledgebases and crowdsourcing. See [ BIGDATA2016a CIKM2016b SDM2016a ] for the on-going research on the topic.