The course Web page is http://www.cse.lehigh.edu/~brian/course/searchengines/. Lecture notes, assignments, and online readings will all be available online.
The course will be taught by Prof. Brian D. Davison. My email is davison (at) cse.lehigh.edu. My homepage is http://www.cse.lehigh.edu/~brian/ where my office hours and contact information can be found.
Lectures will be held Mon/Wed/Fri 11:10-10:00am in Maginnes 103.
CSE 109 Systems Programming or graduate status. It would be helpful to have one or more courses in networking, software engineering, operating systems, databases, data mining, machine learning, pattern recognition, numerical analysis, or information retrieval.
To provide a practical understanding of the design and implementation of modern WWW search engines and their algorithms. This objective is accomplished through a combination of lectures, discussion and analysis of published papers, and extensive hands-on programming projects.
Homework, presentations, and group programming projects
Two hourly midterms (no final exam)
No required texts. Recommended: Modern Information Retrieval, Baeza-Yates and Ribeiro-Neto, Addison Wesley (1999). Available from the university and online bookstores. An Introduction to Search Engines and Web Navigation, Mark Levene, Addison-Wesley (2006). Not available locally, but is carried by Amazon.com.Additional referencesIn addition, we will utilize excerpts from drafts of Introduction to Information Retrieval, Manning, Raghavan and Schütze, Cambridge University Press (2007) and possibly also Mining the Web: Discovering Knowledge from Hypertext Data, 2nd Ed., Chakrabarti (forthcoming).
In addition, conference and journal papers will be assigned and placed online.
I have placed Mining the Web: Discovering Knowledge from Hypertext Data, 1st Ed., Chakrabarti (Morgan Kaufmann, 2003), Managing Gigabytes, 2nd Ed., by Witten, Moffat, and Bell (Morgan Kaufmann, 1999), Finding Out About, by Belew (Cambridge, 2000), and Understanding Search Engines, 2nd Ed., by Berry and Browne (SIAM, 2005) on reserve. Furthermore, Information Retrieval by Van Rijsbergen (Butterworths, 1979), is available on the Web at http://www.dcs.gla.ac.uk/Keith/Preface.html.Grading
Expected grading: homework will be worth 10%; class participation and presentations 25%; the hourly exams 25% (no final); and the projects 40%. Presentations for final projects will be made during the final exam slot.Course Topics
We will cover many topics in this course over the semester. They are expected to include:Computer Facilities
- Search engine implementations
- Information retrieval (IR) models
- Performance evaluation
- Properties of hypertext
- Crawling, indexing, searching and ranking
- Web link analysis
- Parallel and distributed IR
- User interfaces.
The primary computer resource will be the various CSE/ECE Sun workstations (e.g., those in PL122) running the Solaris version of the UNIX operating system, but students are free to utilize other (equivalent) computers for developing their programming assignments. However, all programming assignments, unless explicitly stated otherwise, must work correctly and be submitted on the Suns.Policy on Academic Integrity and Collaboration
All work, unless explicitly stated in the problem definition, is to be an individual effort. You are encouraged to discuss assignments with one another, your friends, and with the instructors and graders of the course. Indeed, this may be the most effective method of learning. You may share concepts, approaches and strategies for producing a solution. However all work submitted in your name must be your own. If necessary, violations will be considered as cases of academic dishonesty.Policy on DisabilitiesIt is sometimes difficult to know where to draw the line between educationally useful sharing of ideas and the educationally destructive copying of ideas. Please refer to the "Collaboration Policy" statement for more examples of what is and what is not unfair collaboration.
If you have a disability for which you are or may be requesting accommodations, please contact your professor and the Office of Academic Services, Room 212, University Center or call (610-758-4152) as early as possible in the semester. University policy states that you must notify your professor seven (7) days prior to the exam.Other Relevant University Policies
There are many other university policies described in the course catalog. A few that also apply here include: