Graduate Student Posters 2007


Previous Poster Menu Next Poster

13. Recognizing Anchor Text Patterns on the Web

Author: Shruti Bhandari

Anchor text, the hyperlinked text, on a page gives a visitor concise information about the page it links to. Used wisely, it boosts the rankings in search engines. A detailed study has been done to extract the features of anchor text, such as finding the similarity between anchor text, user queries submitted to the search engines and the titles of web pages. Also, enormous amount of research has been carried out to apply the anchor text information for finding better search results.

Our task is to extract anchor text patterns on the web. We indexed anchor texts and the target pages they point to by parsing a million pages from a recent crawl of the UK web pages. Given an anchor text, we will group the unique target pages it points to. Similarly, given a target page, we will group the unique anchor texts that point to it. We will provide a user interface to carry out these tasks. Also, in the given dataset, we will find all unique anchor texts and the corresponding target pages. Distribution curves will be plotted using these results. The patterns obtained would help us analyze the link structure and the anchor text distribution on the web. This would, in turn, be useful to carry out automatic web classification, to identify user goals in the queries, to refine the user queries and to deal with the issues related to link bombing.