The goal of this course is to provide a comprehensive overview of the mathematical theory behind machine learning. How can we characterize a good prediction? How can we construct good predictions based on machine learning methods? What is the relationship between (1) estimation error, (2) sample size and (3) model complexity? How do these abstract concepts apply in particular Machine Learning methods such as Boosting, Support Vector Machine, Ridge and LASSO? The objective of the course is to give detailed and intuitively clear answers to those questions. As a result, participants will receive a good preparation for theoretical and empicial work with/on Machine Learning methods.
1. Principles of statistical theory (loss function and risk, approximation vs estimation error, no free lunch theorems)
2. Concentration inequalities for bounded loss functions (Hoeffding’s Lemma, Azuma-Hoeffding’s inequality, Bounded difference inequality, Bernstein’s inequality, McDiarmid inequality)
3. Classification (binary case and its loss function, Bayesian classifier, Optimality of the Bayesian Classifier, Oracle inequalities for the Bayesian classifier, Finite dictionary learning case, The impact of noise on convergence rates, infinite dictionary)
4. General case (general loss functions, symmetrization, Rademacher complexity, Covering numbers, Chaining)
5. Applications Part 1: Vector Machine support, boosting
6. The mathematics and statistics of regularization methods (LASSO, Ridge, elastic net)
7. Applications Part 2: applying LASSO and Ridge