Measuring Similarity to Detect Qualified Links

Full Paper (20 pages)
PDF (446KB)
Xiaoguang Qi, Lan Nie and Brian D. Davison

Abstract
The success of link-based ranking algorithms is achieved based on the assumption that links imply merit of the target pages. However, on the real web, there exist links for purposes other than to confer authority. Such links bring noise into link analysis and harm the quality of retrieval. In order to provide high quality search results, it is important to detect them and reduce their influence. In this paper, a method is proposed to detect such links by considering multiple similarity measures over the source pages and target pages. With the help of a classifier, these noisy links are detected and dropped. After that, link analysis algorithms are performed on the reduced link graph. The usefulness of a number of features are also tested. Experiments across 53 query-specific datasets show that the result of our approach is able to boost Bharat and Henzinger's imp algorithm by around 9% in terms of precision. It also outperforms a previous approach focusing on link spam detection.

Technical Report LU-CSE-06-033, Dept. of Computer Science and Engineering, Lehigh University, December, 2006.

An updated version of "Measuring Similarity to Detect Qualified Links" was published in the Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb). Please cite that version instead.

Back to Brian Davison's publications


Last modified: 4 April 2007 Brian D. Davison