Date |
Topic / Paper(s) |
Presenter |
Critic |
Reviewers |
Mon Jan 14 |
Introduction
Bing Liu. Chapter
1 of Web Data Mining: Introduction, 2007.
Michael J. Hanson and Dylan J. McNamee.
Efficient Reading of Papers in Science and Technology, 2000.
|
Wed Jan 16 |
Discuss Reviewing
Alan Jay Smith.
The Task of the Referee. IEEE Computer, 23(4):65-71, April 1990.
Roy Levin and David D. Redell.
An Evaluation of the Ninth SOSP Submissions -or-
How (and How Not) to Write a Good Systems Paper.
Operating Systems Review, 17(3):35-40, July, 1983.
|
Start Classification
(Chapter 3)
|
Fri Jan 18 |
Classification, continued
|
Discuss projects |
Mon Jan 21 |
Classification, continued
|
Wed Jan 23 |
Bringing
Order to the Web: Automatically Categorizing Search Results,
Chen and Dumais, CHI 2000.
|
Wang |
Strohmaier |
Hong, Yu, Dai |
Fri Jan 25 |
Adversarial
Classification, Dalvi, Domingos, Mausam, Sanghai, Verma,
KDD 2004.
|
Hong |
Yu |
Yang, Wang,
Strohmaier |
Mon Jan 28 |
Finish classification
|
Wed Jan 30 |
Guest Speaker: Prof. Lin Lin on Web usage mining.
Read Chapter 12 (Web Usage Mining).
Web
Usage Mining: Discovery and Applications of Usage
Patterns from Web Data, Srivastava et al., SIGKDD Explorations,
2000.
|
Thu Jan
31 |
Speaker:
Mike Moran: Step-by-Step Search
Marketing Success. Packard 466, 4pm |
Fri Feb
1 |
Start
Clustering (Chapter 4)
|
Mon Feb 4
|
Present project ideas
|
Wed Feb 6 |
Using Web
Structure for Classifying and Describing Web Pages, Glover,
Tsioutsiouliklis, Lawrence, Pennock, and Flake, WWW 2002.
|
Yang |
Wang |
Dai, Hong,
Strohmaier |
Fri Feb 8 |
The Structure of
Broad Topics on the Web, Chakrabarti, Joshi, Punera, Pennock, WWW 2002.
|
Dai |
Hong |
Wang, Yang, Yu |
Mon Feb 11 |
Clustering, continued.
|
Two-page Project Proposals due
|
Wed Feb 13 |
Navigation-Aided
Retrieval, Pandit and Olston, WWW 2007.
|
Yu |
Dai |
Strohmaier,
Yang, Wang |
Fri Feb 15 |
Mining Anchor Text for
Query Refinement, Kraft and Zien, WWW 2004.
|
Strohmaier |
Yang |
Yu, Dai, Hong |
Mon Feb 18 |
Start Information Retrieval
(Chapter 6)
|
Wed Feb 20 |
A
taxonomy of JavaScript redirection spam, Chellapilla and Maykov,
AIRWeb 2007.
|
Hong |
Wang |
Dai,
Strohmaier, Yu |
Fri Feb 22 |
Start Link Analysis
(Chapter 7)
|
Mon Feb 25 |
Finish Link Analysis
|
Wed Feb 27 |
Topic-sensitive
PageRank, Haveliwala, WWW 2002. Also consider the journal version, TKDE
2003.
|
Dai |
Yu |
Wang, Hong, Yang |
Fri Feb 29 |
Guest Speaker: Xiaoguang Qi (Web Page
Classification)
Knowing
a Web Page by the Company It Keeps, Qi and Davison, CIKM 2006.
|
Mon Mar 3 |
Spring
Break |
Wed Mar 5 |
Spring
Break |
Fri Mar 7 |
Spring
Break |
Mon Mar 10 |
Query
Chains: Learning to Rank from Implicit Feedback, Radlinski and
Joachims, KDD 2005 and/or Accurately
Interpreting Clickthrough Data as Implicit Feedback,
Joachims, Granka, Pan, Hembrooke, and Gay, SIGIR 2005. Also skim background
paper: Optimizing
Search Engines using Clickthrough Data, Joachims, KDD 2002.
|
Wang |
Strohmaier |
Hong, Yang, Yu |
Wed Mar 12 |
Guest Speaker: Lan Nie (Topical Link Analysis)
Topical
Link Analysis for Web Search, Nie et al., SIGIR 2006
From
Whence Does Your Authority Come?
Utilizing Community Relevance in Ranking, Nie et al., AAAI 2007
|
Fri Mar 14 |
Ranking the
Web Frontier, Eiron et al, WWW 2004
|
Yang |
Hong |
Dai, Wang, Strohmaier |
Mon Mar 17 |
PageRank
without Hyperlinks: Structural Re-Ranking using Links Induced
by Language Models, Kurland and Lee, SIGIR 2005.
|
Yu |
Yang |
Dai,
Strohmaier, Hong |
Wed Mar 19 |
Sic Transit
Gloria Telae: Towards an Understanding of the Web's Decay, Bar-Yossef
et al, WWW 2004
|
Strohmaier |
Dai |
Yu, Yang, Wang |
Fri Mar 21 |
MapReduce: Simplified Data Processing on Large Clusters, Dean and Ghemawat,
CACM, January 2008. See digital version.
Originally published in OSDI 2004.
Also skim background papers:
The
Google File System, Ghemawat et al., SOSP 2003, and
Web search
for a planet: The Google cluster architecture, Barroso et al,
IEEE Micro, 2003.
Slides
and Video
from a series of
MapReduce
lectures from Google
|
Mon Mar 24 |
Detecting
Phrase-Level Duplication on the World Wide Web,
Fetterly, Manasse, and Najork, SIGIR 2005.
|
Hong |
Strohmaier |
Dai, Wang,
Yang |
Wed Mar 26 |
When Experts
Agree: Using Non-Affiliated Experts to Rank Popular Topics, Bharat and
Mihaila, WWW 2001
|
Dai |
Wang |
Hong,
Strohmaier, Yu |
Fri Mar 28 |
Status updates on course projects
Assign take-home midterm exam
|
Mar 28-31 |
Midterm exam
|
Mon Mar 31 |
Collect midterm exam
Clustering Algorithms and MapReduce
Slides
and Video
Graph Algorithms and MapReduce
Slides
and
Video
from a series of
MapReduce
lectures from Google
|
Wed Apr 2 |
Web-Page
Summarization Using Clickthrough Data,
Sun, Shen, Zeng, Yang, Lu, and Chen, SIGIR 2005.
|
Wang |
Hong |
Hong, Yu |
Fri Apr 4 |
Class cancelled for participation in HPC Day
|
Mon Apr 7 |
Do Not Crawl in the
DUST: Different URLs with Similar Text, Bar-Yossef, Keidar, and
Schonfeld, WWW 2007.
|
Yang |
Dai |
Dai, Strohmaier |
Wed Apr 9 |
Corroborate
and learn facts from the web, Zhao and Betz, KDD 2007.
|
Yu |
Wang |
Wang, Hong |
Fri Apr 11 |
Opinion observer:
analyzing and comparing opinions on the Web, Liu, Hu, and Cheng,
WWW 2005.
For background and other opinion mining, see Chapter 11 of our
textbook.
|
Strohmaier |
Yang |
Yang, Dai |
Mon Apr 14 |
Query type
classification for web document
retrieval, Kang and Jim, SIGIR 2003.
|
None |
Wed Apr 16 |
Searching
the Workplace Web, Fagin, Kumar, McCurley,
Novak, Sivakumar, Tomlin, and Williamson, WWW 2003.
|
Dai |
Hong |
Hong, Yu |
Fri Apr 18 |
Mining web
multi-resolution community-based popularity for information
retrieval, Park and Ramamohanarao, CIKM 2007.
|
Wang |
Strohmaier |
Strohmaier, Yang |
Mon Apr 21 |
Turn in draft project report
MapReduce:
A major step backwards,
and MapReduce
II, DeWitt and Stonebreaker, 2008.
We will also include discussion from
Blog
1 and
Blog
2. Hadoop is the open
source implementation of MapReduce and the Google File System, and is
used in many places, including Yahoo! and Amazon.
|
Wed Apr 23 |
Guest Speaker: Lan Nie
Separate and inequal: Preserving heterogeneity in topical authority
flows, Nie and Davison, SIGIR 2008
|
Fri Apr 25 |
Discussion
|
Thu May 1 |
8-11am,
PL258, Final Exam slot for Presentations |