CSE345/445: WWW Search Engines
Algorithms, Architectures and Implementations (Spring 2017)

Instructor Prof. Brian D. Davison
davison(at)cse.lehigh.edu
http://www.cse.lehigh.edu/~brian/

Time/Location MWF 1:10-2:00pm in Packard 503

Catalog Description Study of algorithms, architectures, and implementations of WWW search engines. Information retrieval (IR) models; performance evaluation; properties of hypertext; crawling, indexing, searching and ranking; link analysis; parallel and distributed IR; user interfaces.

Prerequisites CSE 109 Systems Programming or CSE graduate status

Recommended One or more courses in operating systems, databases, data mining, machine learning/pattern recognition, networking, or numerical analysis.

Introduction With billions of addressable documents publicly accessible, WWW search engines continue to be fundamental to information seeking on the Web. The scale of these engines, both in content and in access make the algorithms, architectures, and implementations of these systems challenging. This course is designed for upper-level undergraduates and graduate students interested in learning how Web search engines function.
This course focuses on the technologies for storing and retrieving from large-scale document datasets. Particular emphasis is given to the data structures and algorithms needed to build efficient search engines for the World Wide Web (WWW). Topics covered include: information retrieval (IR) models, performance evaluation, query languages and operations, properties of hypertext, crawling, indexing, searching, ranking, link analysis, parallel and distributed IR, and user interfaces. Students will participate in class projects involving both the creation and management of a large document collection on the WWW. These projects will require programming in languages such as Python/Perl, C/C++, or Java.

Textbook Introduction to Information Retrieval, Manning, Raghavan and Schütze, Cambridge University Press (2008). Note: available free online.

Useful Links CourseSite, Piazza

This page is http://www.cse.lehigh.edu/~brian/course/searchengines/
Last revised: 17 February 2017.

Instructor	Prof. Brian D. Davison davison(at)cse.lehigh.edu http://www.cse.lehigh.edu/~brian/
Time/Location	MWF 1:10-2:00pm in Packard 503
Catalog Description	Study of algorithms, architectures, and implementations of WWW search engines. Information retrieval (IR) models; performance evaluation; properties of hypertext; crawling, indexing, searching and ranking; link analysis; parallel and distributed IR; user interfaces.
Prerequisites	CSE 109 Systems Programming or CSE graduate status
Recommended	One or more courses in operating systems, databases, data mining, machine learning/pattern recognition, networking, or numerical analysis.
Introduction	With billions of addressable documents publicly accessible, WWW search engines continue to be fundamental to information seeking on the Web. The scale of these engines, both in content and in access make the algorithms, architectures, and implementations of these systems challenging. This course is designed for upper-level undergraduates and graduate students interested in learning how Web search engines function. This course focuses on the technologies for storing and retrieving from large-scale document datasets. Particular emphasis is given to the data structures and algorithms needed to build efficient search engines for the World Wide Web (WWW). Topics covered include: information retrieval (IR) models, performance evaluation, query languages and operations, properties of hypertext, crawling, indexing, searching, ranking, link analysis, parallel and distributed IR, and user interfaces. Students will participate in class projects involving both the creation and management of a large document collection on the WWW. These projects will require programming in languages such as Python/Perl, C/C++, or Java.
Textbook	Introduction to Information Retrieval, Manning, Raghavan and Schütze, Cambridge University Press (2008). Note: available free online.
Useful Links	CourseSite, Piazza

CSE345/445: WWW Search Engines Algorithms, Architectures and Implementations (Spring 2017)

CSE345/445: WWW Search Engines
Algorithms, Architectures and Implementations (Spring 2017)