WWW Search Engines: Algorithms, Architectures, and Implementations
CSE 345/445 Spring 2007 Syllabus (draft)

Course Web page
The course Web page is http://www.cse.lehigh.edu/~brian/course/searchengines/.  Lecture notes, assignments, and online readings will all be available online.
Instructor
The course will be taught by Prof. Brian D. Davison. My email is davison (at) cse.lehigh.edu. My homepage is http://www.cse.lehigh.edu/~brian/ where my office hours and contact information can be found.
Time/Location
Lectures will be held Mon/Wed/Fri 11:10-10:00am in Maginnes 103.
Prerequisites
CSE 109 Systems Programming or graduate status. It would be helpful to have one or more courses in networking, software engineering, operating systems, databases, data mining, machine learning, pattern recognition, numerical analysis, or information retrieval.
Objective
To provide a practical understanding of the design and implementation of modern WWW search engines and their algorithms. This objective is accomplished through a combination of lectures, discussion and analysis of published papers, and extensive hands-on programming projects.
Expected Work
Homework, presentations, and group programming projects
Examinations
Two hourly midterms (no final exam)
Textbooks
No required texts. Recommended: Modern Information Retrieval, Baeza-Yates and Ribeiro-Neto, Addison Wesley (1999). Available from the university and online bookstores. An Introduction to Search Engines and Web Navigation, Mark Levene, Addison-Wesley (2006). Not available locally, but is carried by Amazon.com.

In addition, we will utilize excerpts from drafts of Introduction to Information Retrieval, Manning, Raghavan and Schütze, Cambridge University Press (2007) and possibly also Mining the Web: Discovering Knowledge from Hypertext Data, 2nd Ed., Chakrabarti (forthcoming).

In addition, conference and journal papers will be assigned and placed online.

Additional references
I have placed Mining the Web: Discovering Knowledge from Hypertext Data, 1st Ed., Chakrabarti (Morgan Kaufmann, 2003), Managing Gigabytes, 2nd Ed., by Witten, Moffat, and Bell (Morgan Kaufmann, 1999), Finding Out About, by Belew (Cambridge, 2000), and Understanding Search Engines, 2nd Ed., by Berry and Browne (SIAM, 2005) on reserve.  Furthermore, Information Retrieval by Van Rijsbergen (Butterworths, 1979), is available on the Web at http://www.dcs.gla.ac.uk/Keith/Preface.html.
Grading
Expected grading: homework will be worth 10%; class participation and presentations 25%; the hourly exams 25% (no final); and the projects 40%. Presentations for final projects will be made during the final exam slot.
Course Topics
We will cover many topics in this course over the semester. They are expected to include:
  • Search engine implementations
  • Information retrieval (IR) models
  • Performance evaluation
  • Properties of hypertext
  • Crawling, indexing, searching and ranking
  • Web link analysis
  • Parallel and distributed IR
  • User interfaces.
Computer Facilities
The primary computer resource will be the various CSE/ECE Sun workstations (e.g., those in PL122) running the Solaris version of the UNIX operating system, but students are free to utilize other (equivalent) computers for developing their programming assignments. However, all programming assignments, unless explicitly stated otherwise, must work correctly and be submitted on the Suns.
Policy on Academic Integrity and Collaboration
All work, unless explicitly stated in the problem definition, is to be an individual effort. You are encouraged to discuss assignments with one another, your friends, and with the instructors and graders of the course. Indeed, this may be the most effective method of learning. You may share concepts, approaches and strategies for producing a solution. However all work submitted in your name must be your own. If necessary, violations will be considered as cases of academic dishonesty.

It is sometimes difficult to know where to draw the line between educationally useful sharing of ideas and the educationally destructive copying of ideas. Please refer to the "Collaboration Policy" statement for more examples of what is and what is not unfair collaboration.

Policy on Disabilities
If you have a disability for which you are or may be requesting accommodations, please contact your professor and the Office of Academic Services, Room 212, University Center or call (610-758-4152) as early as possible in the semester. University policy states that you must notify your professor seven (7) days prior to the exam.
Other Relevant University Policies
There are many other university policies described in the course catalog. A few that also apply here include:

This page is http://www.cse.lehigh.edu/~brian/course/2007/searchengines/syllabus.html
Last revised: 20 February 2007.