| Topic |
Required Readings |
Recommended Readings |
|
Introduction |
MIR 1, Chakrabarti 1
|
FOA 1, IR 1 |
|
Evaluation |
MIR 3, Chakrabarti 3.2.1
|
FOA 4.3, MG 4.5, USE 6.1,
IR 7
|
|
Text Preparation |
MIR 6.1-6.2, 7.1-7.2 |
USE 2.1-2.4, FOA
2.2-2.4, MG 3.7 |
|
Indexing |
MIR 8.1-8.3, Chakrabarti 3
|
FOA 4, MG 3.1-3.2, 3.5-3.6,
USE 2.5 |
|
Vector Space Model |
MIR 2.1-2.5 |
MG 4.4, USE 3 |
|
Query languages and operations |
MIR 4, 5, 10.5 |
FOA 4.2, MG 4.2-4.3, USE 5,
6.2 |
|
Compression |
MIR 7.4-7.6, Chakrabarti 3.1.3 |
MG 3.3-3.4, 2, 5, 9
|
|
Clustering |
MIR 5.3.1, 2.7.2, Chakrabarti 4
|
MG 4.6
|
|
Supervised Learning |
MIR 2.8, Chakrabarti 5 |
|
|
Semi-supervised Learning |
Chakrabarti 6
|
|
|
Social Networks/Relationship Analysis |
Chakrabarti 7,
- Content and Link Structure
Analysis for Searching the Web, Efe, Raghavan, and Lakhotia, 2004.
- Authoritative
Sources in a Hyperlinked Environment, Kleinberg, 1997-1999.
- Improved
Algorithms for Topic Distillation in a Hyperlinked Environment, Bharat
and Henzinger, 1998
- Automatic
Resource Compilation by Analyzing Hyperlink Structure and Associated
Text, Chakrabarti et al, 1998.
- DiscoWeb:
Applying Link Analysis to Web Search, Davison et al, 1999
- Recognizing
Nepotistic Links on the Web, Davison, 2000
- The Anatomy of
a Large-Scale Hypertextual Web Search Engine, Brin and Page, 1998
- Topic-Sensitive PageRank, Haveliwala, 2003. (Shorter, original conference
version, 2002)
- The Missing
Link - A Probabilistic Model of Document Content, by Cohn and
Hofmann, 2001.
-
Combining
Link and Content Information in Web Search, Richardson and Domingos,
2004 (Original conference version: The Intelligent
Surfer:
Probabilistic Combination of Link and Content Information in PageRank,
2002)
|
- The PageRank
Citation Ranking: Bringing Order to the Web, Page et al., 1998
- The Connectivity
Server: fast access to linkage information on the Web, Bharat et al, 1998.
- Finding Related Pages in the World Wide Web, Dean and Henzinger, 1999.
- Learning to
Probabilistically Identify Authoritative Documents, Cohn and Chang,
2000.
- What is this Page Known
for? Computing Web Page Reputations, Rafiei and Mendelzon, 2000.
- SALSA:
The Stochastic Approach for Link-Structure Analysis, Lempel and Moran, 2001.
- SimRank:
A Measure of Structural-Context Similarity, Jeh and Widom, 2002
- Searching the
Web, Arasu et al., 2001
- Finding
Authorities and Hubs From Link Structures on the World Wide Web, Borodin et al,
2001.
- Link Analysis,
Eigenvectors and Stability, Ng et al., 2001.
- When Experts
Agree: Using Non-Affiliated Experts to Rank Popular Topics, Bharat and
Mihaila, 2001
- Ranking the
Web Frontier, Eiron et al, 2004
|
|
Measuring and modeling the Web |
Chakrabarti 7,
|
|
|
Resource Discovery |
Chakrabarti 8 |
|
|
Scaling to the Web |
|
- A
Web caching primer, Davison, 2001
- Lessons from Giant-Scale Services, Brewer, 2001.
Draft
IEEE
published version
- Locality in
search engine queries and its implication for caching, Xie and
O'Hallaron, 2002
- Efficient
Computation of PageRank, Haveliwala, 1999.
- On Caching
Search Engine Query Results, Markatos, 2000
- Rank-preserving two level caching for
scalable search engines, Saraiva et al., 2001
- Server-side design principles for scalable
internet systems, Roe and Gonik, 2002 Local
Copy
- Building a
Distributed Full-Text Index for the Web, Melnik et al, 2000.
- Optimized
Query Execution in Large Search Engines with Global Page Ordering,
Long and Suel, 2003.
- ODISSEA: A
Peer-to-Peer Architecture for Scalable Web Search and Information
Retrieval, Suel et al., 2003.
- Optimizing
Result Prefetching in Web Search Engines With Segmented Indices,
Lempel and Moran, 2002.
- Extrapolation
Methods for Accelerating PageRank Computations, Kamvar et al., 2003.
- Exploiting the Block
Structure of the Web for Computing PageRank, Kamvar et al., 2003.
- Adaptive Methods for
the Computation of PageRank, Kamvar et al., 2003.
- An Analytical
Comparison of Approaches to Personalizing PageRank, Haveliwala et al., 2003.
- Mining the Space of
Graph Properties, Jeh and Widom, 2003.
- The
Second Eigenvalue of the Google Matrix, Haveliwala and Kamvar, 2003.
- The WebGraph
Framework I: Compression Techniques, Boldi and Vigna, 2004
- Web search
for a planet: The Google cluster architecture, Barroso et al, 2003.
- MapReduce:
Simplified Data Processing on Large Clusters, Dean and Ghemawat, 2004.
|
|
Web Crawling |
MIR 13.4.5, Chakrabarti 2
|
- Keeping
up with the changing Web, Brewington and Cybenko, 2000
- The Evolution
of the Web and Implications for an Incremental Crawler, Cho and
Garcia-Molina, 2000
-
High-Performance Web Crawling, Najork and Heydon, 2001
- Parallel
Crawlers, Cho and Garcia-Molina, 2002
- Design
and Implementation of a High-Performance Distributed Web Crawler,
Shkapenyuk and Suel, 2001
- UbiCrawler: A
Scalable Fully Distributed Web Crawler, Boldi et al., 2002
|