| Topic |
Readings
(for 397 & 497) |
Readings
for 497 |
| Introduction |
USE 1, FOA 1 |
MIR 1 |
| Text preparation |
USE 2.1-2.4, FOA 2.2-2.4 |
MIR 7.1-7.2 |
Indexing
|
USE 2.5, FOA 2.5 |
MIR 6.1-6.3, 8.1-8.3, 13.4.6 |
Information
retrieval and ranking models
Vector space model
Dimension reduction
|
USE 3, 4; FOA 3, 5.2
|
MIR 2.1-2.5, 13.4.4, 13.5-13.7
|
| Query languages
and operations |
USE 5, 6.2; FOA 4.2 |
MIR 4, 5, 10.5 |
| Performance
evaluation |
USE 6.1; FOA 4.3
|
MIR 3 |
Hypertext properties of the WWW
|
|
MIR 2.10.3, 6.4
|
Scaling
Parallel and distributed IR
Caching
Compression |
- A
Web caching primer, Davison, 2001 (background)
- On Caching Search Engine Query Results, Markatos, 2000
- Rank-preserving two level caching for scalable search engines, Saraiva et al., 2001
- Locality in search engine queries and its implication for caching, Xie and O'Hallaron, 2002
- Lessons from Giant-Scale Services, Brewer, 2001.
Draft
IEEE published version
- Server-side design principles for scalable internet systems, Roe and Gonik, 2002 Local Copy
|
MIR 7.4-7.6, 8.8, 9, 13.4.1-2
|
Web crawling
|
FOA 8.1
- Parallel Crawlers, Cho and Garcia-Molina, 2002
- High-Performance Web Crawling, Najork and Heydon, 2001
- Design and Implementation of a High-Performance Distributed Web Crawler, Shkapenyuk and Suel, 2001
- UbiCrawler: A Scalable Fully Distributed Web Crawler, Boldi et al., 2002
- How dynamic is the web?, Brewington and Cybenko, 2000
- The Evolution of the Web and Implications for an Incremental Crawler, Cho and Garcia-Molina, 2000
- Crawling the Hidden Web, Raghavan and Garcia-Molina, 2001
|
MIR 13.4.5
|
Link analysis
|
FOA 6.1-6.3;
- The PageRank Citation Ranking: Bringing Order to the Web, Page et al., 1998
- Authoritative Sources in a Hyperlinked Environment, Kleinberg, 1997-1999.
- Improved Algorithms for Topic Distillation in a Hyperlinked Environment, Bharat et al, 1998
- Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text, Chakrabarti et al, 1998.
- The Connectivity Server: fast access to linkage information on the Web, Bharat et al, 1998.
- Inferring Web Communities from Link Topology, Gibson et al, 1998.
- Trawling the web for emerging cyber-communities, Kumar et al, 1999.
- Finding Related Pages in the World Wide Web, Dean and Henzinger, 1999.
- Focused crawling: a new approach to topic-specific Web resource discovery, Chakrabarti et al, 1999.
- Finding Authorities and Hubs From Link Structures on the World Wide Web, Borodin et al, 2001.
- Topic-Sensitive PageRank, Haveliwala, 2002.
|
MIR 13.4.4
|
Implementations
|
- Finding What People Want: Experiences with the WebCrawler, Pinkerton, 1994
- Lycos: Design choices in an internet search service, Maudlin, 1997
- AltaVista ranking of query results, van Eylen, 1998.
- The Anatomy of a Large-Scale Hypertextual Web Search Engine, Brin and Page, 1998
- DiscoWeb: Applying Link Analysis to Web Search, Davison et al, 1999
- Searching the Web, Arasu et al., 2001
|
|
Search Engine Manipulation
|
|
|