Research Areas
|
|
|
|
Explainable and fair ML on graphs
|
|
|
Graphs are ubiquitous in many applications, such as molecular biochemistry, neural science, Internet, computer vision, NLP, and crowdsourcing
[ ICDM2021c]).
Machine learning on graphs, especially with neural networks, has demonstrate accurate predictive power. PI Xie is investigating explainability and fairness beyond accuracy.
(i) On large graphs, power-law degree distributions are common and can lead to fairness issues in the graphical models and affect end-users.
We propose a linear system to certificate if multiple desired fairness criteria can be fulfilled simultaneously, and if not,
a multi-objective optimization algorithm to find Pareto fronts for efficient trade-offs among the criteria
[ CIKM2021]).
To reduce optimization cost, the team proposes continuous Pareto front exploration
by exploiting the smoothness of the set of Pareto optima.
(ii) Graphical models can be hard to understand by human users due to multiplexed information propagations over many edges.
The team published a series of works addressing challenges in making graphical models more interpretable,
such as
large discrete search space
[ ICDM2019]),
axiomatic attribution
[ CIKM2020]),
multi-objective explanations
[ ICDM2021a]),
and robustness of explanations via constrained optimization
[ ICDM2021b]).
|
|
|
Trustworthy fraud detection
|
|
|
Online medias, such as Yelp, TripAdvisor and Amazon, are full of opinionated information that can significantly influence
a large number of customers' decisions.
Due to the ``word-of-mouth'' effect,
dishonest businesses have adopted unethical or even illegal marketing strategies by paying
spammers to post fake reviews (opinion spams) to promote or demote the targets businesses and products, leading to trustworthiness issues
of the online contents.
To address the issue, trustworthy (defined by AIR="Accurate, Interpretable, and Robust") fraud detections is required (sketched in
[ CIC2018]).
We've adopted propagations over networks
[ ICDM2011],
temporal patterns
[ KDD2012],
text features
[ DSAA2015] and
multi-source data
[ BigData2016a,
BigData2015].
Spam detectors are also constantly under attack of adversarial spammers in a changing environments and robust detectors
are critical
[ BigData2018,
KDD 2020].
|
|
|
Ensemble and model fusion
|
|
|
An ensemble of multiple models, if fused properly, can provide more predictive power than any constituent model.
Traditional ensemble methods, with a long history, studied the fusion of a small number of predictive models for binary and multi-class prediction.
We move this field forward by targeting at fusing many unidentifiable predictive models,
such as crowdsourcing workers,
with sparse and structured output such as sets, rankings, and trees.
The challenges are to gauge the individual model's performance
and to take into account the extra knowledge of the output space. Please check
out these three papers
[ ICDM2013,
DSAA2015,
CIKM2016a]
along with others
[ SDM2012,
KDD2014,
SDM2015b].
Along with Dr. Qi Li from Iowa State, we extended the framework to address fusion problem on sequential data found in natural language processing
[ ICDM2021_c].
|
|
|
Extreme Multi-labeled Learning
|
|
|
Multi-labeled learning is a technique to assign more than one semantic concepts to a data item and has found wide applications in areas
such as bioinformatics, healthcare, e-commerce, and social media.
Big data have changed the landscape of multi-labeled learning by increasing the number of labels to an unprecedented scale.
For example, there are tens of thousands of tags (as labels) for texts
on Stackoverflow and Yahoo Answer, and
millions of tags for images on Flickr.
Extreme multi-labeled learning tries to scale up the traditional multi-labeled learning
to handle the large number of labels with varying importance, sparsity, and informativeness.
My current research tries to address the scalability challenges in many aspects
through the help of NLP, knowledgebases, and crowdsourcing. See
[ BIGDATA2016a
CIKM2016b
SDM2016a]
for the on-going research on the topic.
|
|
|
|
|
|
|
|
Funding
|
We are thankful to following funding agencies for their support to our research.
- CAREER: Bilevel Optimization for Accountable Machine Learning on Graphs (NSF IIS-2145922)
- Program in the Foundations and Applications of Mathematical Optimization and Data Science (Lehigh Research Future Grant)
- Efficient, explainable and robust data scientific methods for smart engineering systems (Lehigh Accelerator Grant)
- Algorithms, systems, and theories for exploiting data dependencies in crowdsourcing (NSF IIS-2008155)
- Learning Dynamic and Robust Defenses Against Co-Adaptive Spammers (NSF CNS-1931042)
|
|
|
|
|
|
|
|
|