CSE345/445: WWW Search Engines
Algorithms, Architectures and Implementations
(Spring 2015)

InstructorProf. Brian D. Davison
Time/Location Tue/Thu 1:10-2:25pm in Packard 360
Catalog Description Study of algorithms, architectures, and implementations of WWW search engines. Information retrieval (IR) models; performance evaluation; properties of hypertext; crawling, indexing, searching and ranking; link analysis; parallel and distributed IR; user interfaces.
Prerequisites CSE 109 Systems Programming or CSE graduate status
Recommended One or more courses in networking, software engineering, operating systems, databases, data mining, machine learning, pattern recognition, numerical analysis, or information retrieval.
Introduction With billions of addressable documents publicly accessible, WWW search engines continue to be fundamental to information seeking on the Web. The scale of these engines, both in content and in access make the algorithms, architectures, and implementations of these systems challenging. This course is designed for upper-level undergraduates and graduate students interested in learning how Web search engines function.

This course focuses on the technologies for storing and retrieving large-scale hypertext datasets. Particular emphasis is given to the data structures and algorithms needed to build efficient search engines for the World Wide Web (WWW). Topics covered include: information retrieval (IR) models, performance evaluation, query languages and operations, properties of hypertext, crawling, indexing, searching, ranking, link analysis, parallel and distributed IR, and user interfaces. Students will participate in class projects involving both the creation and management of a large document collection on the WWW. These projects will require programming in languages such as Python/Perl, C/C++, or Java.

Useful Links CourseSite, Piazza

This page is http://www.cse.lehigh.edu/~brian/course/searchengines/
Last revised: 17 January 2015.