Measuring Similarity to Detect Qualified Links

Xiaoguang Qi, Lan Nie and Brian D. Davison

Full Paper (8 pages)
Official ACM published version: http://doi.acm.org/10.1145/1244408.1244418
Author's version: PDF (316KB)

Abstract
The early success of link-based ranking algorithms was predicated on the assumption that links imply merit of the target pages. However, today many links exist for purposes other than to confer authority. Such links bring noise into link analysis and harm the quality of retrieval. In order to provide high quality search results, it is important to detect them and reduce their influence. In this paper, a method is proposed to detect such links by considering multiple similarity measures over the source pages and target pages. With the help of a classifier, these noisy links are detected and dropped. After that, link analysis algorithms are performed on the reduced link graph. The usefulness of a number of features are also tested. Experiments across 53 query-specific datasets show our approach almost doubles the performance of Kleinberg's HITS and boosts Bharat and Henzinger's imp algorithm by close to 9% in terms of precision. It also outperforms a previous approach focusing on link farm detection.

In Proceedings of the 3rd International Workshop on Adversarial Information Retrieval for the Web (AIRWeb), pages 49-56, Banff, Canada, May 2007. ACM Press.

© ACM, 2007. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.

Back to Brian Davison's publications


Last modified: 20 November 2008
Brian D. Davison