Course Description: Machine learning is the computational study of artificial systems that can adapt to novel situations, discover patterns from data, and improve performance with practice. This course will cover the mathematical foundation of supervised and unsupervised learning. The course will provide a state-of-the-art overview of the field, with an emphasis on implementing and deriving learning algorithms for a variety of models from first principles. 3 credits.
Accommodations for Remote Instruction in Fall 2020: Due to the ongoing disruptions caused by COVID-19, this course will operate in a fully remote mode for Fall 2020. The course will implement the following accommodations for remote instruction:
- Lecture material will be provided asynchronously as pre-recorded videos to accommodate students from different time zones.
- Towards the start of the semester, students will be able to sign up for an optional synchronous small group discussion section. Each section will meet once per week for one hour via Zoom. Enrollment in each section will be limited to 15 students. To the best of our ability, meeting times will be selected to accommodate students’ time zones. These sections will be run by PhD student teaching assistants. Prof. Marlin will attend half of these sections each week (we expect to run up to 10 sections).
- As in past years, online platforms will be used to distribute all content. The course website will be hosted on the University’s Moodle platform. Assignment material and quizzes will be distributed through Moodle. Lecture video content will be distributed through Echo 360. The course will use Piazza for a discussion forum. Assignments will be submitted through Gradescope. Links to content on all platforms will be provided through the course’s Moodle site.
Detailed course topics: Overview of supervised and unsupervised learning; mathematical foundations of numerical optimization and statistical estimation; maximum likelihood and maximum a posteriori (MAP) estimation; missing data and expectation maximization (EM); graphical models including mixture models, hidden-Markov models; logistic regression and generalized linear models; maximum entropy and undirected graphical models; nonparametric models including nearest neighbor methods and kernel-based methods; and dimensionality reduction methods (PCA and LDA). The course will focus on deriving learning algorithms from first principles and implementing them from scratch.
- Location: Online only.
- Website: The course website will be hosted on Moodle.
Textbook: The course will use Machine Learning: A Probabilistic Perspective by Kevin Murphy as the course text. This text is available to UMass students for free through the UMass library.
Computing: Access to a relatively modern computer will be required to complete the assignment for the course. The course will use Python as a programming language.
Required Background: This course requires a strong mathematical background in probability and statistics, multivariate calculus and linear algebra. See below for recommended preparation over the summer.
What is the difference between COMPSCI 689 and COMPSCI 589?: 589 was designed to focus on understanding and applying core machine learning models and algorithms. 689 focuses on the mathematical foundations of machine learning with a focus on deriving and implementing machine learning algorithms for novel models from scratch. The course is primarily intended for students interested in pursuing research on machine learning models and algorithms. It focuses on the math-to-code-to-experiments-to-results pipeline needed to take machine learning research ideas from conception to publication. While both 589 and 689 require a background in multivariate calculus, linear algebra, and probability; 689 will use more of this background material than 589.
Who Should Take COMPSCI 689?: 689 is primarily intended as an AI area core course for doctoral stream students. Undergraduate students should take COMPSCI 589 before applying for an override for COMPSCI 689 without exception. Professional MS students and other graduate students from outside computer science should also take COMPSCI 589 before attempting COMPSCI 689 unless they have a prior undergraduate background in machine learning or an extremely strong background in mathematics, statistics, and programming (for example, an undergraduate degree in mathematical computing).
What Should I do to Prepare to Take 689?
- Make sure 689 is the right course for you and this is the right time to take it. See the suggestions above about 589 vs 689.
- Set-up your schedule to accommodate the course. All students are strongly advised against taking 689 in combination with any other PhD-level core course unless they have extremely strong backgrounds in all areas. You can make-up gaps in background at the same time you learn primary course material, but you will need to be prepared to devote extra time to the course to do so.
- Start addressing gaps or weaknesses in you background now. 689 starts with the assumption that you have sufficient background knowledge of linear algebra, vector calculus, multi-variate probability, and Python, and will integrate aspects of these topics together from the outset (e.g., using differential calculus to derive a method for optimizing the parameters of a multi-variate probability density over a vector space and then implementing the method in Python). The course does not cover background topics, but to help you prepare we have assembled a reading list that covers what you need to know to get started in the course. Reviewing all of the material below with a focus on weaker areas is a good strategy for all students. The specific sources below may cover material at a deeper level than is included in some undergrad CS programs (for example, computational complexity of linear algebra operations), so all students may want to at least skim this material.
Suggested Reading List:
Covering the math in the order listed below is likely to be most helpful. For calculus, Corral or Marsden and Tromba can be used. Marsden and Tromba is more detailed, but Corral will do. All texts are open access or freely available through the UMass Library (links provided), except for Marsden and Tromba. The course’s Piazza site will open at the beginning of the summer to facilitate discussion of background material among students.
- Stephen Boyd. Introduction to Applied Linear Algebra.
- Chapter 1: Vectors
- Chapter 2.1: Linear Functions
- Chapter 3: Norm and Distance
- Chapter 5: Linear Independence
- Chapter 6: Matrices
- Chapter 8: Linear Equations (Can skip 8.2)
- Chapter 10: Matrix Multiplication
- Chapter 11: Matrix Inverses
- Stephen Boyd and Lieven Vandenberghe. Convex Optimization. (Covers additional linear algebra background missing from the Applied text)
- Appendix A.1, A.3, A.4, A.5
- Appendix C.1, C.2, C.3, C.4
- Michael Corral. Vector Calculus
- Chapter 1: Vectors in Euclidean Space (1.1 to 1.6, 1.8)
- Chapter 2: Functions of Several Variables (2.1 to 2.5)
- Chapter 3: Double Integrals (3.1, 3.3, 3.4, 3.7)
- Marsden and Tromba. Vector Calculus
- Chapter 1: Geometry of Euclidean Space (1.1, 1.2, 1.3, 1.5)
- Chapter 2: Differentiation (2.1, 2.2, 2.3, 2.5, 2.6)
- Chapter 3: Higher Order Derivatives (3.1, 3.3)
- Chapter 4: Vector Valued Functions (4.1)
- Chapter 5: Double and Triple Integrals (5.1, 5.2, 5.5)
- Bishop. Pattern Recognition and Machine Learning (probability from an ML perspective)
- Chapter 1: Introduction (1.2)
- Chapter 2: Probability Distributions (2.1, 2.2, 2.3, 2.4)
- Murphy. Machine Learning: A Probabilistic Perspective (more probability from an ML perspective)
- Chapter 2: Probability
- Python background (NumPy, SciPy, PyTorch)