Topical Locality in the Web: Experiments and Observations
Technical Report (22 pages)
PDF (185KB)
Brian D. Davison
May 2000
Abstract
Most web pages are linked to others with related content. This idea,
combined with another that says that text in, and possibly around,
HTML anchors describe the pages to which they point, is the foundation
for a usable World-Wide Web. In this paper, we examine to what extent
these ideas hold by empirically testing whether topical locality
mirrors spatial locality of pages on the Web. In particular, we find that
the likelihood of linked pages having similar textual content to be high;
the similarity of sibling pages increases when the links from the parent
are close together; titles, descriptions, and anchor text represent at
least part of the target page; and that anchor text may be a useful
discriminator among unseen child pages. These results present the
foundations necessary for the success of many web systems, including search
engines, focused crawlers, linkage analyzers, and intelligent web agents.
Technical Report DCS-TR-414, Department of Computer Science, Rutgers
University, May 2000.
A shorter version of this paper is available as a
conference paper.
Back
to Brian Davison's publications
Last modified: 11 January 2001
Brian D. Davison