Reinforcement Learning

Fall 2019

When and where	TR 7:55-9:10AM
Instructor	Héctor Muñoz-Avila, munoz@cse.lehigh.edu,
Instructor's office hours	TBA

Texts

Required:

Reinforcement Learning: An Introduction (second edition) Richard S. Sutton and Andrew G. Barto

This book is an excellent reading; one of the best technical books that the instructor has read. Read Chapter 1 (particularly Section 1.5) and see for yourself.

Description

Reinforcement learning (RL) is a general learning paradigm where an agent (e.g., a robot) interacts with its environment (e.g., a sewer canal maze) to accomplish some task (e.g., find locations in the sewer with dangerous gas concentration levels). The agent learns through reward signals it receives from the environment. Here is a video introducing the three basic concepts of reinforcement learning: rewards, states, and actions.

RL is motivated by behavioral psychology. This motivation can be illustrated in the following video which shows initial training of a dog to take the action of staying put. Food is used as a reward and a firm voice as a punishment.

RL has been shown to be useful in solving a wide variety of tasks including (click links to see some videos): autonomous vehicles navigation tasks, robotics, programming game AI; here is a RL system playing pacman). Reinforcement learning has also been used to explain biological systems such as bee foraging and brain activities

RL has been applied succesfully in a number of fields including:

Chain management tasks
Controlling Unmanned Air Vehicles (UAVs)
Gameplaying tasks
Elevator control
Network routing problems
Human-computer interaction (HCI)
Economics (trade)

(for details of these and other applications see here and here).

In this course, we begin by precisely formulating the reinforcement learning problem. We will study techniques for solving this problem, limitations and research issues. Concepts such as Markov states, Markov property, Markov Decision process, dynamic programming and Monte Carlo methods will be covered as well as modern topics such as deep reinforcement learning, which has resulted in "super human" performance in games such as Go, and Starcraft 2. For further details please see this draft of the textbook.

Question for the "fans": does RL algorithms can exhibit overfitting? if so how it can be countered?

The course was taught three times and in the course evaluations students rated the course with an average of 4.90 (out of 5.00).

Prerequisities

The course is self-contained; no prior knowledge of machine learning is required.

The main requirement is some background on probability (e.g., CSE/Math 231 Probability and Statistics or equivalent is required for all students); ECO 45 Statistics can be accepted but the student will need to contact the instructor for an override.

In addition, for RCEAS majors and any CS majors, including CSBs and BAs, programming is required (CSE 109 Systems Programming or equivalent).

CogSci, Psychology, and Business and Economics majors (excluding CSB), who generally would have not taken CSE 109, will need to contact the instructor to request an override. This will be granted on a case-by-case basis. They still need to have taken the probability and statistics course.

Graduate students from any major will generally be granted an override provided they have taken a Probability and Statistics course. They still will need to contact the instructor for the override.

Homework

There will be homework assignments in most weeks but EXCLUDING: two weeks before the programming projects are due, same week as there is a test. In all, there will be a total of 6-7 written homeworks. Homeworks are used for exercising concepts and problem-solving.

Attendance

Attendance to class is required.

Tests

There will be two tests but NO final exam. Tests measure the student's understanding of the concepts rather than their problem-solving skills; the latter is tested in the projects and homeworks. So for example, a question in the test could ask to list advantages and disadvantages of using dynamic programming over versus temporal difference whereas a project will ask you to develop and write an algorithm that controls the pacman effectively.

Projects

The "warm-up" project will be a programming project involving UC Berkeley's pacman.

The final project will be a programming project for all engineering students involving deep reinforcement learning. Likely we will use Deepmind's Starcraft 2 environment.

Non-Engineering students will either (1) use an existing RL software, run experiments with it, and write a report analyzing the results; or (2) write a report on a literature review.

Grading

Exams:(Test # 1: 25%, Test # 2: 25%)
"Warm-up" project: 10%
Final Project: 20%
Homework: 20%

Last update: Wed Mar. 13 16:47:57 EDT 2019