Instructor Prof. Brian D. Davison
Time/Location MWF 1:10-2:00pm in Packard 503 Catalog Description Study of algorithms, architectures, and implementations of WWW search engines. Information retrieval (IR) models; performance evaluation; properties of hypertext; crawling, indexing, searching and ranking; link analysis; parallel and distributed IR; user interfaces. Prerequisites CSE 109 Systems Programming or CSE graduate status Recommended One or more courses in operating systems, databases, data mining, machine learning/pattern recognition, networking, or numerical analysis. Introduction With billions of addressable documents publicly accessible, WWW search engines continue to be fundamental to information seeking on the Web. The scale of these engines, both in content and in access make the algorithms, architectures, and implementations of these systems challenging. This course is designed for upper-level undergraduates and graduate students interested in learning how Web search engines function.
This course focuses on the technologies for storing and retrieving from large-scale document datasets. Particular emphasis is given to the data structures and algorithms needed to build efficient search engines for the World Wide Web (WWW). Topics covered include: information retrieval (IR) models, performance evaluation, query languages and operations, properties of hypertext, crawling, indexing, searching, ranking, link analysis, parallel and distributed IR, and user interfaces. Students will participate in class projects involving both the creation and management of a large document collection on the WWW. These projects will require programming in languages such as Python/Perl, C/C++, or Java.
Textbook Introduction to Information Retrieval, Manning, Raghavan and Schütze, Cambridge University Press (2008). Note: available free online. Useful Links CourseSite, Piazza