CS 326/426: Foundation of Machine Learning

Course Information

Description: An introductory course offers a broad overview of the main techniques in machine learning. Students will study the theory, algorithms and implementations for machine learning. Topics covered include supervised learning (decision trees, naive Bayes and regression, boosting, SVM, perceptron), learning theory (bias/variance tradeoffs; PAC learning, and VC theory); probabilistic graphical model; unsupervised learning (dimensionality reduction, clustering, EM algorithms); deep neural networks and reinforcement learning.

Lectures: Monday/Wednesday 8:45-10:00, Location: Room 241,Rauch Business Center.

Office Hours: Sandbox in Packard Lab

Prof.Xie: Tuesday 10:00 pm - noon.
Jiaxin Liu: Tuesday/Thursday 4:10 pm-5:10 pm.

Prerequisites: For CSE 326: (CSE 002 or CSE 012) and (Math 205 or Math 43) and (Math 231 or ISE 121 or ECO 045).
We will use Python as the only language for implementation assignments. You can check this out for a background assessment.

Formats: 1 take-home mid-term and 1 final exam, 7 homeworks, and 4 coding projects (with different levels of difficulty for 326/426).

Submission: All homeworks and projects must be submitted to Coursesite. Homeworks need to be in PDF format prepared by Word/Latex/Page, or scanning of your writtings. Latex is highly recommended and here is a comprehensive (table, images, equations, paragraphs and sections) and yet short tutorial on Latex. Word and Page also support typing equations, but can be a bit awkward.

Grading: Mid-term (15%), homework (30%), coding projects (30%), final (25%). Late submissions will be penalized 20% of the total grades per late day (fraction of a day will be rounded up to one day) and no assignment will be accepted more than four days after its due date.

Textbooks

PRML= Pattern Recognition and Machine Learning, Christopher M. Bishop, Springer, 2011 (required).

UML= Understanding Machine Learning: From Theory to Algorithms, Shai Shalev-Shwartz and Shai Ben-David, 1st Edition, Cambridge University Press, 2014 (required). eBook.

ML= Machine Learning: A Probabilistic Perspective, Kevin P. Murphy, MIT Press, 2012 (optional). eBook.

RL= Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, Second Edition, MIT Press, 2017 (optional). eBook.

Online Resources

Coursesite: assignments and grades posting Link.

Piazza: question answering Link.

This website: general course information.

Schedule

Projects

Description: There are four programming projects (3 individual projects and one team project). Each student needs to implement the projects using Python to get full credit. For individual projects, no machine learning packages (such as Tensorflow and Sklearn) will be allowed, although you can use more fundamental packages for plotting (matplotlib), data representations and operations (numpy, scipy, networkx). The focus here is sufficient deep understanding of the models and algorithms rather than experience running machine learning models. For group project, depending on the nature of the project, you may use off-the-shelf softwares to expedite your exploration (per permission of the instructor). Sharing and copying solutions are considered as a violation of honor code. This includes but not limited to sharing codes through any kind of media (including repositories on Github/Bitbucket), copying solutions from online forums/repositories/blogs, textbook solution manuals and previous years' submissions.

Project 1 [Individual project] Implement logistic regression.
Project 2 [Individual project] Implement SVM in the primal and dual.
Project 3 [Individual project] NN or RL related. Details TBD.
Project 4 [Team project] Project Description This project consists of understanding and implementing ideas from a classic machine learning paper with a certain level of sophistication. Example papers can be found in this suggested project list. Individual projects can't be repeated here, although a more advanced models based on those is fine (e.g., implementing a Beyesian logistic regression is considered different from Project 1). 2-3 students can join one team (no single student can be in one team). The deliverables include a proposal, a progress check, a final report, data and codes with documentations. Each team member needs to specify his/her contribution to the project in the report.

Datasets

UCI machine learning archive.
Paperswithcode. The website collects the State-of-the-art (SOTA) papers on publicly available datasets. This is more modern than the UCI dataset.
Kaggle. The largest machine learning competition platform on earth. It has both small and big, old and modern datasets.

Statement on Academic Integrity

All homework, project and exam submissions should be your own work, by the following definitions:

You can discuss with your classmates about questions in the homeworks, but you should write the answers by your own. Helps received from the Internet, tutors, classmates and teaching staffs should be acknowledged.
Programming submissions should be your own codes and written documentations, although you can discuss with your classmates about the algorithms and datasets. Trivial modifications of others' codes will rarely evade our advanced detection techniques. Helps received from the Internet, tutors, classmates and teaching staffs should be acknowledged. If you read open-sourced codes and then write your own, you should note the sources in your project reports.

If you have doubt about where the line is, consult with the instructor for clarification.

The Principles of Our Equitable Community

The Principles of Our Equitable Community: Lehigh University endorses The Principles of Our Equitable Community (www.lehigh.edu/diversity). We expect each member of this class to acknowledge and practice these Principles. Respect for each other and for differing viewpoints is a vital component of the learning environment inside and outside the classroom.
Accommodations for Students with Disabilities: Accommodations for Students with Disabilities: If you have a disability for which you are or may be requesting accommodations, please contact both your instructor and the Office of Academic Support Services, University Center C212 (610-758-4152) as early as possible in the semester. You must have documentation from the Academic Support Services office before accommodations can be granted.
Lehigh University Policy on Harassment and Non-Discrimination Lehigh University upholds The Principles of Our Equitable Community and is committed to providing an educational, working, co-curricular, social, and living environment for all students, staff, faculty, trustees, contract workers, and visitors that is free from harassment and discrimination on the basis of age, color, disability, gender identity or expression, genetic information, marital or familial status, national or ethnic origin, race, religion, sex, sexual orientation, or veteran status. Such harassment or discrimination is unacceptable behavior and will not be tolerated. The University strongly encourages (and, depending upon the circumstances, may require) students, faculty, staff or visitors who experience or witness harassment or discrimination, or have information about harassment or discrimination in University programs or activities, to immediately report such conduct. If you have questions about Lehigh’s Policy on Harassment and Non-Discrimination or need to report harassment or discrimination, contact the Equal Opportunity Compliance Coordinator (Alumni Memorial Building / 610.758.3535 / eocc@lehigh.edu)