Notes

Projected Gradient Descent

Published:

As a follow-up from my previous note on convex optimization, this note studies the so-called projected gradient descent method and its sibling, proximal gradient descent. Using the fundamental inequalities from convex analysis, we shall show that both of the methods enjoy similar convergence properties to gradient descent for unconstrained optimization.

Matrix Perturbation and Davis-Kahan Theorem

Published:

In this note we will study matrix perturbation theory and find out the answer to some basic questions such as what happens when adding small perturbations to a symmetric matrix, or how much the invariant subspace spanned by its eigenvectors can change. Understanding the effect of small perturbation on matrices is the key to analysis of local convergence in many optimization algorithms.

Some Useful Inequalities in Convex Optimization

Published:

This note collects a set of inequalities for smooth convex functions. As an introduction, we shall prove the convergence of gradient descent method using some of these inequalities. In particular, we show that gradient descent with fixed step size converges to a global minimum at a sublinear rate when the objective is smooth convex, and at a linear rate when strong convexity is added. It is also surprising to me that while the loose bound, $1-\mu/L$, is commonly used in standard convex analysis, the tight bound, $(1-\mu/L)^2$, is a lesser known fact and can hardly be found in notes/lectures on this topic.

Estimating a Dirichlet Distribution using Expectation–Maximization

Published:

This note describes the EM algorithm for estimating a Dirichlet distribution. One result that I found particularly interesting is the posterior distributions of gamma-distributed random variables given the observation of their corresponding Dirichlet distribution are also gamma-distributed.

Multinomial Logistic Regression: Convexity and Smoothness

Published:

Multinomial logistic regression is a generalization of binary logistic regression to multiclass problems. This note will explain the nice geometry of the likelihood function in estimating the model parameters by looking at the Hessian of the MLR objective function.