Date
 Slides
 Related Links

July 7
 How many iterations of gradient descent do we need?
 Cauchy's 1847 paper,
Lipschitz relationships,
practical linesearches,
PL condition.

July 14
 Momentum, acceleration, and secondorder methods
 heavyball,
CG,
SSO,
accelerated gradient,
restarting,
quadratic convergence
damped Newton (Section 9.5),
cubic regularization.

July 21
 Coordinate optimization and stochastic gradient descent
 random coordinate descent,
shuffle coordinate descent,
GaussSouthwell,
block coordinate descent,
accelerated coordinate descent.

July 28
 SGD with Constant Step Sizes, Growing Batches, and OverParameterization
 nonconvex SGD,
decreasing step SGD,
constant step SGD,
shuffle SGD,
growing batch size,
SGC,
accelerated SGD,
nonuniform SGD,
SGD + Armijo.

August 4
 No lecture


August 11
 Variance reduction and 1.5Order Methods
 SAG,
SVRG,
nonuniform sampling,
acceleration
loopless SVRG,
SGD*,
SVRG for deep learning,
diagonal approximation,
Hessianfree Newton
minibatch Hessian,
Newton sketch,
2.5order,
BarzilaiBorwein,
quasiNewton (superlinear),
LBFGS,
initialization,
LBFGS preconditioning,
explicit superlinear
)

August 18+
 Baby break


January 27
 Projected Gradient, Projected Newton, and FrankWolfe
 Translation of original PG and PN paper,
projection onto simple sets (Section 8.1),
Dykstra's algorithm,
active set identification and PG backtracking,
spectral projected gradient,
twometric projection,
projected quasiNewton,
projected coordinate descent,
FrankWolfe

February 17
 Global Optimization, Subgradients, and Cutting Planes

Random search
Bayesian optimization,
harmless global optimization,
BO rate,
subgradients,
subgradient method,
stochastic subgradient,
suffix averaging,
(k+1) averaging,
weaklyconvex rate,
tame function convergence,
smoothing,
adaptive smoothing,
cutting planes,
randomized center of gravity,
ignoring nonsmoothness,
bundle methods,
orthantprojected minnorm subgradient (Chapter 2)

April 21
 ProximalGradient and Fenchel Duality

Proximalgradient (and acceleration),
active set complexity,
proximal PL,
group L1regularization,
structured sparsity,
inexact proximalgradient,
proximal average,
ADMM,
coordinatewise proximalgradient,
stochastic proximalgradient,
regularized dual averaging,
proximal SVRG,
proximal Newton,
proximal point,
convex conjugate and duality (Section 3.3 and Chapter 5)
kernel methods,
Lipschitzsmoothness and strongconvexity duality,
Fenchel duality,
SDCA,
dualfree SDCA
gap safe screening,
SVM safe screening
