CSE 450: Web Mining Seminar
Spring 2008 Reading Schedule
(subject to change)

Date Topic / Paper(s) Presenter Critic Reviewers
Mon Jan 14 Introduction
  • Bing Liu. Chapter 1 of Web Data Mining: Introduction, 2007.
  • Michael J. Hanson and Dylan J. McNamee. Efficient Reading of Papers in Science and Technology, 2000.
  • Wed Jan 16 Discuss Reviewing
  • Alan Jay Smith. The Task of the Referee. IEEE Computer, 23(4):65-71, April 1990.
  • Roy Levin and David D. Redell. An Evaluation of the Ninth SOSP Submissions -or- How (and How Not) to Write a Good Systems Paper. Operating Systems Review, 17(3):35-40, July, 1983.
  • Start Classification (Chapter 3)
    Fri Jan 18 Classification, continued
    Discuss projects
    Mon Jan 21 Classification, continued
    Wed Jan 23
  • Bringing Order to the Web: Automatically Categorizing Search Results, Chen and Dumais, CHI 2000.
  • Wang Strohmaier Hong, Yu, Dai
    Fri Jan 25
  • Adversarial Classification, Dalvi, Domingos, Mausam, Sanghai, Verma, KDD 2004.
  • Hong Yu Yang, Wang, Strohmaier
    Mon Jan 28 Finish classification
    Wed Jan 30 Guest Speaker: Prof. Lin Lin on Web usage mining. Read Chapter 12 (Web Usage Mining).
  • Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data, Srivastava et al., SIGKDD Explorations, 2000.
  • Thu Jan 31 Speaker: Mike Moran: Step-by-Step Search Marketing Success. Packard 466, 4pm
    Fri Feb 1 Start Clustering (Chapter 4)
    Mon Feb 4 Present project ideas
    Wed Feb 6
  • Using Web Structure for Classifying and Describing Web Pages, Glover, Tsioutsiouliklis, Lawrence, Pennock, and Flake, WWW 2002.
  • Yang Wang Dai, Hong, Strohmaier
    Fri Feb 8
  • The Structure of Broad Topics on the Web, Chakrabarti, Joshi, Punera, Pennock, WWW 2002.
  • Dai Hong Wang, Yang, Yu
    Mon Feb 11 Clustering, continued.
    Two-page Project Proposals due
    Wed Feb 13
  • Navigation-Aided Retrieval, Pandit and Olston, WWW 2007.
  • Yu Dai Strohmaier, Yang, Wang
    Fri Feb 15
  • Mining Anchor Text for Query Refinement, Kraft and Zien, WWW 2004.
  • Strohmaier Yang Yu, Dai, Hong
    Mon Feb 18 Start Information Retrieval (Chapter 6)
    Wed Feb 20
  • A taxonomy of JavaScript redirection spam, Chellapilla and Maykov, AIRWeb 2007.
  • Hong Wang Dai, Strohmaier, Yu
    Fri Feb 22 Start Link Analysis (Chapter 7)
    Mon Feb 25 Finish Link Analysis
    Wed Feb 27
  • Topic-sensitive PageRank, Haveliwala, WWW 2002. Also consider the journal version, TKDE 2003.
  • Dai Yu Wang, Hong, Yang
    Fri Feb 29 Guest Speaker: Xiaoguang Qi (Web Page Classification)
  • Knowing a Web Page by the Company It Keeps, Qi and Davison, CIKM 2006.
  • Mon Mar 3 Spring Break
    Wed Mar 5 Spring Break
    Fri Mar 7 Spring Break
    Mon Mar 10
  • Query Chains: Learning to Rank from Implicit Feedback, Radlinski and Joachims, KDD 2005 and/or Accurately Interpreting Clickthrough Data as Implicit Feedback, Joachims, Granka, Pan, Hembrooke, and Gay, SIGIR 2005. Also skim background paper: Optimizing Search Engines using Clickthrough Data, Joachims, KDD 2002.
  • Wang Strohmaier Hong, Yang, Yu
    Wed Mar 12 Guest Speaker: Lan Nie (Topical Link Analysis)
  • Topical Link Analysis for Web Search, Nie et al., SIGIR 2006
  • From Whence Does Your Authority Come? Utilizing Community Relevance in Ranking, Nie et al., AAAI 2007
  • Fri Mar 14
  • Ranking the Web Frontier, Eiron et al, WWW 2004
  • Yang Hong Dai, Wang, Strohmaier
    Mon Mar 17
  • PageRank without Hyperlinks: Structural Re-Ranking using Links Induced by Language Models, Kurland and Lee, SIGIR 2005.
  • Yu Yang Dai, Strohmaier, Hong
    Wed Mar 19
  • Sic Transit Gloria Telae: Towards an Understanding of the Web's Decay, Bar-Yossef et al, WWW 2004
  • Strohmaier Dai Yu, Yang, Wang
    Fri Mar 21
  • MapReduce: Simplified Data Processing on Large Clusters, Dean and Ghemawat, CACM, January 2008. See digital version. Originally published in OSDI 2004. Also skim background papers: The Google File System, Ghemawat et al., SOSP 2003, and Web search for a planet: The Google cluster architecture, Barroso et al, IEEE Micro, 2003.
  • Slides and Video from a series of MapReduce lectures from Google
  • Mon Mar 24
  • Detecting Phrase-Level Duplication on the World Wide Web, Fetterly, Manasse, and Najork, SIGIR 2005.
  • Hong Strohmaier Dai, Wang, Yang
    Wed Mar 26
  • When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics, Bharat and Mihaila, WWW 2001
  • Dai Wang Hong, Strohmaier, Yu
    Fri Mar 28 Status updates on course projects
    Assign take-home midterm exam
    Mar 28-31 Midterm exam
    Mon Mar 31 Collect midterm exam
  • Clustering Algorithms and MapReduce Slides and Video
  • Graph Algorithms and MapReduce Slides and Video
    from a series of MapReduce lectures from Google
  • Wed Apr 2
  • Web-Page Summarization Using Clickthrough Data, Sun, Shen, Zeng, Yang, Lu, and Chen, SIGIR 2005.
  • Wang Hong Hong, Yu
    Fri Apr 4 Class cancelled for participation in HPC Day
    Mon Apr 7
  • Do Not Crawl in the DUST: Different URLs with Similar Text, Bar-Yossef, Keidar, and Schonfeld, WWW 2007.
  • Yang Dai Dai, Strohmaier
    Wed Apr 9
  • Corroborate and learn facts from the web, Zhao and Betz, KDD 2007.
  • Yu Wang Wang, Hong
    Fri Apr 11
  • Opinion observer: analyzing and comparing opinions on the Web, Liu, Hu, and Cheng, WWW 2005.
    For background and other opinion mining, see Chapter 11 of our textbook.
  • Strohmaier Yang Yang, Dai
    Mon Apr 14
  • Query type classification for web document retrieval, Kang and Jim, SIGIR 2003.
  • None
    Wed Apr 16
  • Searching the Workplace Web, Fagin, Kumar, McCurley, Novak, Sivakumar, Tomlin, and Williamson, WWW 2003.
  • Dai Hong Hong, Yu
    Fri Apr 18
  • Mining web multi-resolution community-based popularity for information retrieval, Park and Ramamohanarao, CIKM 2007.
  • Wang Strohmaier Strohmaier, Yang
    Mon Apr 21 Turn in draft project report
  • MapReduce: A major step backwards, and MapReduce II, DeWitt and Stonebreaker, 2008.
    We will also include discussion from Blog 1 and Blog 2. Hadoop is the open source implementation of MapReduce and the Google File System, and is used in many places, including Yahoo! and Amazon.
  • Wed Apr 23 Guest Speaker: Lan Nie
  • Separate and inequal: Preserving heterogeneity in topical authority flows, Nie and Davison, SIGIR 2008
  • Fri Apr 25 Discussion
    Thu May 1 8-11am, PL258, Final Exam slot for Presentations


    This page is http://www.cse.lehigh.edu/~brian/course/2008/webmining/schedule.html
    Last revised: 25 April 2008, Prof. Davison.