| Day | Date | Topic(s) |
Required Readings |
Suggested Readings |
| Mon | Jan 15 | Welcome |
IIR 1
Finding
What People Want: Experiences with the WebCrawler, Pinkerton, 1994
Lycos: Design choices in an internet search
service, Maudlin, 1997
|
Levene 1;
MIR 1; MtW 1; FOA 1; IR 1 |
| Wed | Jan 17 | Overview |
|
As we may think, Vannevar Bush, 1945
|
| Fri | Jan 19 | Evaluation
|
IIR 8 |
Levene 2, 5.4;
MIR 3; MtW 3.2.1; FOA 4.3; MG 4.5; USE 6.1; IR 7
SavvySearch,
Howe and Dreilinger, 1997
|
| Mon | Jan 22 |
Finish
Evaluation;
Overview
of Indexing Process
|
|
Levene 4.1-4.6 |
| Wed | Jan 24 |
Text
Preparation |
IIR 2 |
Levene 5.1;
MIR 6.1-6.2, 7.1-7.2; USE
2.1-2.4; FOA
2.2-2.4; MG 3.7 |
| Fri | Jan 26 | Indexing |
IIR 4
| MIR
8.1-8.3; MtW 3; FOA 4; MG 3.1-3.2, 3.5-3.6;
USE 2.5 |
| Mon | Jan 29 | Inverted
Indices | | |
| Wed | Jan 31 | Vector-space
model; Zipf's law; term weightings | IIR 7 | MIR 2.1-2.5;
MG 4.4; USE 3 |
| Fri | Feb 02 | Compression |
IIR 5; MIWch4 4.1-4.3
AltaVista
ranking of query results, van Eylen, 1998.
|
MIR 7.4-7.6; MtW 3.1.3; MG 3.3-3.4, 2, 5, 9 |
| Mon | Feb 05 | Finish
compression; Queries; Feedback |
IIR 9
|
Levene 6.4.3;
MIR 4, 5, 10.5;
FOA 4.2; MG 4.2-4.3; USE 5, 6.2
|
| Wed | Feb 07 | Finish
Indexing
|
|
|
| Fri | Feb 09 | Clustering |
IIR 16-18;
MIWch4 4.5, 4.8
|
MIR 5.3.1, 2.7.2; MtW 4; MG 4.6
|
| Mon | Feb 12 | Clustering/Dimension Reduction |
|
|
| Wed | Feb 14 |
Guest speaker:
Dr. Marc Najork, Microsoft
Research (Maginnes 101)
| | |
| Fri | Feb 16 |
LSI
|
|
Indexing
by
Latent Semantic Analysis, Deerwester et al., 1990
|
| Mon | Feb 19 |
PLSI
|
Probabilistic
Latent Semantic Indexing, Hoffman, 1999. Required only for 445
students.
|
Levene 6.1
Informed
Projections, Cohn, 2002.
|
| Wed | Feb 21 |
Collaborative Filtering, Supervised Learning |
IIR 13-15
|
Levene 3, 6.1, 9.4;
MIR 2.8;
MtW 5;
MIWch4 4.6
Amazon.com recommendations: item-to-item
collaborative filtering, Linden, Smith, and York, 2003.
Clustering
Methods for Collaborative Filtering, Ungar and Foster, 1998.
|
| Fri | Feb 23 | Supervised
Learning | | |
| Mon | Feb 26 |
Bayesian Classification,
Review
Sample Exam 1
Questions | | |
| Wed | Feb 28 |
Short project presentations,
Bayesian Networks, Discriminative
Classifiers | | |
| Fri | Mar 2 |
Finish project presentations
| | |
| Mon | Mar 05 | NO CLASS - Spring Break | | |
| Wed | Mar 07 | NO CLASS - Spring Break | | |
| Fri | Mar 09 | NO CLASS - Spring Break | | |
| Mon | Mar 12 |
Discriminative
classifiers, Semi-supervised learning
|
Silk from a Sow's Ear: Extracting Usable Structures from the Web,
Pirolli, Pitkow, and Rao, 1996
(ACM
PDF)
ParaSite: Mining Structural Information on the Web, Spertus, 1997
(PDF)
|
MtW 6
Knowing a
Web Page by the Company It Keeps, Qi and Davison, 2006.
|
| Wed | Mar 14 | Hourly
Exam | | |
| Fri | Mar 16 |
Start Social Networks
|
Content and Link Structure
Analysis for Searching the Web, Efe, Raghavan, and Lakhotia, 2004.
Authoritative
Sources in a Hyperlinked Environment, Kleinberg, 1997-1999.
The
Anatomy of
a Large-Scale Hypertextual Web Search Engine, Brin and Page,
1998
|
Levene 5.2, 6.4.4, 9.1, 9.2;
MtW 7; USE 7
The
PageRank Citation Ranking: Bringing Order to the Web, Page et al., 1998
Finding
Related Pages in the World Wide Web, Dean and Henzinger, 1999.
|
| Mon | Mar 19 |
Review
Exam 1
Continue with social networks -
PageRank
|
|
Learning
to Probabilistically Identify Authoritative Documents, Cohn and
Chang, 2000.
What is this Page
Known for? Computing Web Page Reputations, Rafiei and Mendelzon, 2000.
SALSA:
The Stochastic Approach for Link-Structure Analysis, Lempel and
Moran, 2001.
SimRank:
A Measure of Structural-Context Similarity, Jeh and Widom,
2003
Searching the
Web, Arasu et al., 2001
Finding
Authorities and Hubs From Link Structures on the World Wide Web,
Borodin et al., 2001.
Link Analysis,
Eigenvectors and Stability, Ng et al., 2001.
When
Experts Agree: Using Non-Affiliated Experts to Rank Popular
Topics, Bharat and Mihaila, 2001
Ranking
the Web Frontier, Eiron et al., 2004
|
| Wed | Mar 21 | Link
Analysis: HITS |
Improved
Algorithms for Topic Distillation in a Hyperlinked Environment,
Bharat and Henzinger, 1998
Automatic
Resource Compilation by Analyzing Hyperlink Structure and Associated
Text, Chakrabarti et al., 1998.
DiscoWeb:
Applying Link Analysis to Web Search, Davison et al., 1999
| |
| Fri | Mar 23 |
Discuss
"Automatic Resource Compilation by Analyzing Hyperlink Structure and
Associated Text", by Chakrabarti et al, 1998.
| | |
| Mon | Mar 26 |
Discuss "Improved Algorithms for Topic Distillation in a Hyperlinked
Environment", by Bharat and Henzinger, 1998
Link
nepotism
|
Recognizing
Nepotistic Links on the Web, Davison, 2000
| |
| Wed | Mar 28 |
Discuss "The Anatomy of a Large-Scale Hypertextual Web Search
Engine", Brin and Page, 1998
Paper presentation (Kar and Wang): "Topic-Sensitive
PageRank"
|
Topic-Sensitive PageRank, Haveliwala, 2003. (Shorter, original
conference version, 2002)
The
Missing Link - A Probabilistic Model of Document Content, by Cohn
and Hofmann, 2001. (Required only for 445 students.)
| |
| Fri | Mar 30 | Paper presentation (Prabhakar and
Smith): "Combining
Link and Content Information in Web Search"
Guest presentation (Lan Nie):
"Topical Link Analysis for Web
Search"
|
Combining
Link and Content Information in Web Search, Richardson and
Domingos, 2004 (Original conference version: The Intelligent
Surfer: Probabilistic Combination of Link and Content Information in
PageRank, 2002)
Topical
Link Analysis for Web Search, Nie, Davison, and Qi,
2006.
| |
| Mon | Apr 02 |
Project 2 presentations
|
|
|
| Wed | Apr 04 |
Last project 2 presentation
Paper presentation (Moukhine and Wojciechowski):
"Detecting Spam Web Pages
through Content Analysis"
|
Detecting Spam
Web Pages through Content Analysis, Ntoulas et al., 2006
What's New
on the Web? The Evolution of the Web from a Search Engine Perspective,
Ntoulas et al., 2004
|
Levene 5.3, 9.6; MtW 7
Graph structure in
the web, Broder et al., 2000
Topical
Locality in the Web, Davison, 2000
Sic
Transit Gloria Telae: Towards an Understading of the Web's Decay,
Bar-Yossef et al., 2004
|
| Fri | Apr 06 |
Link analysis
Paper presentation (Deak and Bhandari): "What's new on the Web?"
| | MtW 8
Inferring
Web Communities from Link Topology, Gibson et al, 1998.
Focused
crawling: a new approach to topic-specific Web resource discovery,
Chakrabarti et al., 1999.
Trawling
the web for emerging cyber-communities, Kumar et al., 1999.
|
| Mon | Apr 09 |
Modeling the Web
Paper
presentation (Moukhine and Wojciechowski):
"The Google File System"
|
The Link
Database: Fast Access to Graphs of the Web, Randall et al., 2001.
The
Google File System, Ghemawat et al., 2003.
MapReduce:
Simplified Data Processing on Large Clusters, Dean and Ghemawat,
2004.
|
The
Connectivity Server: fast access to linkage information on the
Web, Bharat et al., 1998.
|
| Wed | Apr 11 |
Paper presentation (Brendan Melville):
"MapReduce: Simplified Data Processing on
Large Clusters"
Review Sample Exam 2 Questions
DiscoWeb
| | |
| Fri | Apr 13 | No class --
inauguration of university
president | | |
| Mon | Apr 16 | Hourly
Exam | | |
| Wed | Apr 18 | Resource
Discovery
|
Scaling
Personalized Web Search, Jeh and Widom, 2003.
|
Levene 4.7;
A
Web caching primer, Davison, 2001
Lessons from Giant-Scale Services, Brewer, 2001.
Draft
IEEE
published version
Locality
in search engine queries and its implication for caching, Xie and
O'Hallaron, 2002
Efficient
Computation of PageRank, Haveliwala, 1999.
On
Caching Search Engine Query Results, Markatos, 2000
Rank-preserving two level caching for
scalable search engines, Saraiva et al., 2001
Server-side design principles for scalable
internet systems, Roe and Gonik, 2002 Local
Copy
Building a
Distributed Full-Text Index for the Web, Melnik et al., 2000.
Optimized
Query Execution in Large Search Engines with Global Page Ordering,
Long and Suel, 2003.
ODISSEA: A
Peer-to-Peer Architecture for Scalable Web Search and Information
Retrieval, Suel et al., 2003.
Optimizing
Result Prefetching in Web Search Engines With Segmented Indices,
Lempel and Moran, 2002.
Extrapolation
Methods for Accelerating PageRank Computations, Kamvar et al.,
2003.
Exploiting
the
Block Structure of the Web for Computing PageRank, Kamvar et al., 2003.
Adaptive Methods
for the Computation of PageRank, Kamvar et al., 2003.
An Analytical
Comparison of Approaches to Personalizing PageRank, Haveliwala et
al., 2003.
Mining the Space
of Graph Properties, Jeh and Widom, 2003.
The
Second Eigenvalue of the Google Matrix, Haveliwala and Kamvar,
2003.
The
WebGraph Framework I: Compression Techniques, Boldi and Vigna, 2004
Web
search
for a planet: The Google cluster architecture, Barroso et al.,
2003.
Failure
Trends in a Large Disk Drive Population, Pinheiro, Weber and Barroso,
2007.
|
| Fri | Apr 20 |
Scaling to the Web
Paper presentation (Wang and Kar): "Scaling
Personalized Web Search"
|
Three-Level Caching
for Efficient Query Processing in Large Web Search Engines, Long and
Suel, 2005
| |
| Mon | Apr 23 |
Paper presentation (Smith and Prabhakar):
"Three-Level Caching for Efficient
Query Processing in Large Web Search Engines"
Paper presentation (Bhandari and Deak):
"Crawling the Hidden Web"
|
Crawling the
Hidden Web, Raghavan and Garcia-Molina, 2001
|
Levene 4.6
|
| Wed | Apr 25 |
Web
Crawling
|
|
MIR 13.4.5; MtW 2
Keeping
up with the changing Web, Brewington and Cybenko, 2000
The
Evolution
of the Web and Implications for an Incremental Crawler, Cho and
Garcia-Molina, 2000
High-Performance Web Crawling, Najork and Heydon, 2001
Parallel
Crawlers, Cho and Garcia-Molina, 2002
Design
and Implementation of a High-Performance Distributed Web Crawler,
Shkapenyuk and Suel, 2001
UbiCrawler: A
Scalable Fully Distributed Web Crawler, Boldi et al., 2002
|
| Fri | Apr 27 |
Finish Crawling
| | |
| Wed | May 2 | 4-7pm Final
Presentations in Maginnes 102 | | |