SEEM 5380: Optimization Methods for High-Dimensional Statistics

2016-17 Second Term

Announcements

NEW: Problem 4(a) as originally stated is problematic. Therefore, it has been removed from the final. All other questions remain the unchanged. The latest version of the question sheet can be downloaded from the link below. The maximum score for the final will now be 90.
NEW: The take-home final is here. Please contact the course instructor if you need any clarification or have any questions.
NEW: The take-home final exam will be released on Apr 24, 2017. It is due at 5pm on May 10, 2017. You must work on the exam alone. In other words, you cannot discuss with anyone. However, you may use any material, provided that proper citation is given.
Starting from January 17, 2017, the venue of the Tuesday lecture will be changed to HYS LG04.
Welcome to SEEM 5380! The first class will be held on January 10, 2017 (Tuesday). The class on January 9, 2017 (Monday) is cancelled and will be made up later.
PLEASE READ: Student/Faculty Expectations on Teaching and Learning, from the Faculty of Engineering, The Chinese University of Hong Kong.

General Information

Instructor: Anthony Man-Cho So (manchoso at se.cuhk.edu.hk)
Office Hours: Wednesdays 3:00pm - 4:30pm or by appointment, in ERB 604
Lecture Time/Location:
- Mondays 11:30am - 1:15pm, in LSB C2
- Tuesdays 11:30am - 1:15pm, in HYS LG04

Course Description

The prevalence of high-dimensional data has motivated active research on efficient methods for tackling optimization problems that arise in statistical analysis. In this course, we will give an introduction to this exciting area of research, with emphasis on the theory of structured regularizers for high-dimensional statistics and the design and analysis of statistically and computationally efficient optimization algorithms. Applications in various areas of science and engineering, such as machine learning, signal processing, and statistics, will also be discussed. Prerequisite: ENGG 5501 or equivalent.

Course Outline (subject to change)

Course Requirements

Homework sets (60%) and a take-home final examination (40%)

Open problems will be introduced throughout the semester. Students who solve any one of the open problems will automatically receive an A for the course.

General References

Yu. Nesterov: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, 2004.

Schedule and Reading

Week 1 - Jan 10: Introduction
Week 2 - Jan 16: The notions of decomposable regularizer and restricted strong convexity and their consequences.
Jan 17: Bound on statistical error for general models. Application to sparse linear regression.
Reading:
- Negahban, Ravikumar, Wainwright, Yu. A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers. Statistical Science 27(4): 538-557, 2012. Supplementary Material
- Lecture Notes
Week 3 - Jan 23, 24: Establishing the restricted strong convexity property for sparse linear regression.
Reading:
- Raskutti, Wainwright, Yu. Restricted Eigenvalue Properties for Correlated Gaussian Designs. Journal of Machine Learning Research 11: 2241-2259, 2010.
- The proof of Gordon's inequality can be found in Chapter 8.7 of
  Foucart, Rauhut. A Mathematical Introduction to Compressive Sensing. Springer Science+Business Media, New York, 2013.
- For extension of Raskutti et al.'s result to non-Gaussian settings, see
  Rudelson, Zhou. Reconstruction from Anisotropic Random Measurements. IEEE Transactions on Information Theory 59(6): 3434-3447, 2013.
  Zhou. Restricted Eigenvalue Conditions on Subgaussian Random Matrices. Technical Report, 2009.
- Lecture Notes (Part 1, Part 2)
- Besides sparse linear regression, matrix completion also enjoys the restricted strong convexity property; see
  Negahban, Wainwright. Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise. Journal of Machine Learning Research 13: 1665-1697, 2012.
Week 4 - Jan 30, 31: Chinese New Year.
Week 5 - Feb 6, 7: Algorithmic aspects of regularized loss minimization problems - The proximal gradient method and its convergence analysis.
Reading:
- The lecture material is based on
  Hou, Zhou, So, Luo. On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization. Advances in Neural Information Processing Systems 26: Proceedings of the 2013 Conference (NIPS 2013), pp. 710-718, 2013.
  
  See, in particular, Theorem 3.2.
  
  WARNING: The conclusion of Theorem 3.1 of the above paper is not correct. This is due to a flaw in the proof. The correct conclusion appears in Proposition 12 of
  Zhou, So. A Unified Approach to Error Bounds for Structured Convex Optimization Problems. Mathematical Programming, Series A, to appear, 2016.
- Some useful properties of the proximal map can be found in
  Combettes, Wajs. Signal Recovery by Proximal Forward-Backward Splitting. Multiscale Modeling and Simulation 4(4): 1168-1200, 2005.
  Rockafellar, Wets. Variational Analysis. Grundlehren der Mathematischen Wissenschaften 327, Springer-Verlag, 2004. (Particularly Chapters 1.G and 2.E)
- Lecture Notes
  The lemma on p.6 of the notes (concerning certain monotonicity properties of the proximal map) first appears in
  Sra. Scalable Nonconvex Inexact Proximal Splitting. Advances in Neural Information Processing Systems 25: Proceedings of the 2012 Conference (NIPS 2012), pp. 530-538, 2012.
  
  However, to obtain the desired conclusion, an envelope theorem should be invoked in the proof. In fact, just having a unique solution to displayed equation (18) of the above paper is not sufficient to guarantee the differentiability of the function E_g. The envelope theorem mentioned on p.8 of the notes comes from Proposition 1 of
  Oyama, Takenawa. On the (Non-)Differentiability of the Optimal Value Function When the Optimal Solution Is Unique. Manuscript, 2013.
  
  Envelope theorems have many applications in economics; see the above paper and the pointers therein for details.
- The statistical nature of regularized loss minimization problems can be exploited to develop an alternative convergence analysis of first-order methods. This is done, e.g., in
  Agarwal, Negahban, Wainwright. Fast Global Convergence of Gradient Methods for High-Dimensional Statistical Recovery. Annals of Statistics 40(5): 2452-2482, 2012. Supplementary Material
  
  The above paper shows that for several classes of regularized loss minimization problems, certain first-order methods enjoy global linear convergence up to the statistical precision of the model. Modulo the global vs. local nature of the linear convergence, this is weaker than the convergence result derived from the error bound theory. Nevertheless, the approach developed in the above paper can be used to tackle problems that are not known to possess an error bound.
Week 6 - Feb 13, 14: Algorithmic aspects of regularized loss minimization problems - Error bounds.
Reading:
- The lecture material is based on
  Zhou, So. A Unified Approach to Error Bounds for Structured Convex Optimization Problems. Mathematical Programming, Series A, to appear, 2016.
- Lecture Notes
- A streamlined proof of Hoffman's error bound can be found in Section 11.8 of
  Güler. Foundations of Optimization. Graduate Texts in Mathematics 258, Springer Science+Business Media, LLC, 2010.
- The notion of linear regularity used in Corollary 1 of the lecture notes and other related notions are discussed at length in
  Bauschke, Borwein. On Projection Algorithms for Solving Convex Feasibility Problems. SIAM Review 38(3): 367-426, 1996.
- The outer Lipschitz continuity property used in the lecture is discussed in Sections 3C and 3D (particularly Theorem 3D.1) of
  Dontchev, Rockafellar. Implicit Functions and Solution Mappings. Springer Monographs in Mathematics, Springer Science+Business Media, LLC, 2009.
Week 7 - Feb 20, 21: Design and analysis of a successive quadratic approximation method for regularized loss minimization problems.
Reading:
- The lecture material is based on
  Yue, Zhou, So. A Family of SQA Methods for Non-Smooth Non-Strongly Convex Minimization with Provable Convergence Guarantees. Manuscript, 2017.
- Lecture Notes
Week 8 - Feb 27, 28: Sketching and its applications to the design and analysis of second-order methods.
Reading:
- The lecture material is based on
  Pilanci, Wainwright. Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence. SIAM Journal on Optimization, to appear, 2016.
  
  As noted in the lecture, the approach in the above paper does not offer any advantage for unconstrained problems, as the sketching dimension could be as high as the dimension of the decision vector. In fact, the same is true for a constrained problem if the optimal solution lies in the interior of the constraint set; see displayed equation (3.8) in the above paper.
- Lecture Notes
Week 9 - Mar 6, 7: Non-convex regularized loss minimization for sparse linear regression.
Reading:
- The lecture material is based on
  Loh, Wainwright. High-Dimensional Regression with Noisy and Missing Data: Provable Guarantees with Nonconvexity. Annals of Statistics 40(3): 1637-1664, 2012. Supplementary Material
- Lecture Notes
Week 10 - Mar 13: Non-convex regularizers in statistical estimation problem
Mar 14: Bound on statistical error in the non-convex setting.
Reading:
- The Smoothly Clipped Absolute Deviation (SCAD) penalty and the Minimax Concave Penalty (MCP) penalty were introduced and analyzed respectively in
  Fan, Li. Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association 96(456): 1348-1360, 2001.
  Zhang. Nearly Unbiased Variable Selection under Minimax Concave Penalty. Annals of Statistics 38(2): 894-942, 2010.
  
  For an in-depth analysis of the role of various non-convex regularizers in sparse estimation problems, see
  Zhang, Zhang. A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems. Statistical Science 27(4): 576-593, 2012. Supplementary Material
- The lecture material is based on
  Loh, Wainwright. Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima. Journal of Machine Learning Research 16(Mar): 559-616, 2015.
- Lecture Notes
Week 11 - Mar 20, 21: Bound on optimization error in the non-convex setting.
Reading:
- The lecture material is based on
  Loh, Wainwright. Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima. Journal of Machine Learning Research 16(Mar): 559-616, 2015.
- Lecture Notes
- The restricted strong convexity property for sparse linear regression with additively corrupted observations is established in Corollary 1 of
  Loh, Wainwright. High-Dimensional Regression with Noisy and Missing Data: Provable Guarantees with Nonconvexity. Annals of Statistics 40(3): 1637-1664, 2012. Supplementary Material
Week 12 - Mar 27: Phase Synchronization: Formulation and preliminary observations.
Mar 28: The Generalized Power Method and bound on estimation error.
Reading:
- The lecture material is based on
  Liu, Yue, So. On the Estimation Performance and Convergence Rate of the Generalized Power Method for Phase Synchronization. Preprint, 2016.
- Lecture Notes
- For an introduction to optimization on manifolds, see
  Absil, Mahony, Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008.
Week 13 - Apr 3: Error bound for Phase Synchronization.
Apr 4: Ching Ming Festival.
Reading:
- Lecture Notes
- Some improved probabilistic estimates for Phase Synchronization have recently been obtained in
  Zhong, Boumal. Near-Optimal Bounds for Phase Synchronization. arXiv, 2017.
Week 14 - Apr 10: Error bound for Phase Synchronization (cont'd).
Apr 11: The Generalized Power Method for MIMO Detection.
Reading:
- Lecture Notes (for Apr 11)
- The probabilistic estimates used can be found in
  So. Probabilistic Analysis of the Semidefinite Relaxation Detector in Digital Communications. Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2010), pp. 698-711, 2010.
Week 15 - Apr 17: Easter Monday.
Apr 18: Wrap-up and possible future studies.
Reading:
- Inexact gradient methods do not necessarily have the sufficient decrease property. Nevertheless, under appropriate assumptions, their convergence rates can be analyzed using the error bound framework; see, e.g.,
  So, Zhou. Non-Asymptotic Convergence Analysis of Inexact Gradient Methods for Machine Learning Without Strong Convexity. Accepted for publication in Optimization Methods and Software, 2017.
- Here are some recent articles on the analysis of non-convex optimization problems; see also the references therein:
  Sun, Qu, Wright. When are Nonconvex Problems not Scary? Preprint, 2016.
  Ge, Jin, Zheng. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis. Manuscript, 2017.
- There has been some recent interest in applying difference-of-convex optimization techniques to statistical optimization problems; see
  Yin, Lou, He, Xin. Minimization of l_1-2 for Compressed Sensing. SIAM Journal on Scientific Computing 37(1): A536-A563, 2015.
  Pang, Razaviyayn, Alvarado. Computing B-Stationary Points of Nonsmooth DC Programs. Mathematics of Operations Research 42(1): 95-118, 2017.
  Nouiehed, Pang, Razaviyayn. On the Pervasiveness of Difference-Convexity in Optimization and Statistics Manuscript, 2017.

Homework Sets

Homework 1 (due February 24, 2017)
Homework 2 (due March 20, 2017)

Last Updated: April 27, 2017