# L1General

Mark Schmidt (2006)

L1General is a set of Matlab routines implementing several of the available strategies for solving L1-regularization problems. Specifically, they solve the problem of optimizing a differentiable function f(x) and a (weighted) sum of the absolute values of the parameters:

min_x: f(x) + sum_i v_i |x_i|

The v_i must be non-negative, but can be zero if we do not want to penalize some variables. There is a lot of interest in solving problems of this form from a variety of communities, such as signal processing and statistics. This is because the penalty on the absolute values of the variables encourages the solution to be sparse (many variables are set to exactly 0).

The absolute value function is not differentiable if a variable is zero. This makes the optimization problem harder than solving differentiable optimization problems. However, the absolute value function has a lot of nice properties, and there are a wide variety of interesting approaches for solving L1-regularization problems. Usually these focus on the case where f(x) is the linear squared error function or the logistic regression negative log-likelihood.

Instead of focusing on a specific form of f(x), the L1General software only assumes that the user can provide a 'black box' function that returns f(x) and its gradient for a given parameter setting. This is similar to optimization codes for differentiable optimization like minFunc. The purpose of writing the L1General codes was to compare the performance of different optimization strategies in this black box setting. Although treating the function as a black box means that the codes don't fully take advantage of the structure of the objective, this perspective makes it very easy to use the codes to solve a variety of different L1-regularization problems.

### Usage

All the methods have a common interface:
```>> x = L1General2_PSSgb(funObj,x0,lambda,options);
```
The input parameters are:
• funObj: an anonymous function that returns the function and gradient value given a parameter setting 'x'. The second-order methods also require funObj to return the Hessian.
• x0: an initial guess of the optimal solution. Some of the methods will take substantially fewer iterations if this is close to the optimal solution.
• lambda: a vector that is the same size as x0, giving the (non-negative) weights on the penalties on the absolute values of the parameters. Often, these will all be set to the same positive value. To avoid penalizing a variable, set the corresponding element of this vector to 0.
• options: a structure that specifies any non-default parameter settings you want the method to use. This is an optional argument.
You can replace L1General2_PSSgb with one of the other functions to use a different method. For example, you could use L1General_Projection instead. Roughly, the (newer) 'L1General2' methods only require funObj to return the function and gradient value, while the (older) 'L1General' methods also require that funObj returns the matrix of second derivatives. Alternately, the older methods use a BFGS approximation if you set options.order = 1. The list of available methods is given in the updates section of this webpage.

To use the code, download and unzip L1General.zip. Then, in Matlab, type:
```>> cd L1General          % Change to the unzipped directory
>> addpath(genpath(pwd)) % Add all directories to the path
>> mexAll                % Compile mex files (not necessary on all systems)
>> example_L1General     % Runs a demo of the (older) L1General codes
>> demo_L1General        % Runs a demo of the (newer) L1General codes
>> demo_L1Generalpath    % Runs a demo of computing a regularization path
```
The first two demos apply a variety of the different methods available to solve the same problem with the older and newer codes (respectively), pausing between running each method. The third demo shows how to embed the codes in an active-set continuation method for handling a larger number of variables and/or for regularization path calculation.

I prepared a more extensive set of examples of using L1General on the webpage Examples of using L1General. Rather than comparing the performance of different optimization methods, these demos demonstrate the 'plug and play' nature of the code. The full list of demos is:

• LASSO
• Elastic Net
• Logistic Regression
• Logistic Regression (larger number of variables)
• Lasso Regularization Path
• Huber Robust Regression Regularization Path
• Logistic Regression Regularization Path
• Probit Regression, Smooth SVM, Huberized SVM
• Non-Parameteric Logistic Regression with Sparse Prototypes
• Multinomial Logistic Regression
• Compressed Sensing
• Chain-structured conditional random field
• Graphical Lasso
• Markov Random Field Structure Learning
• Neural Network with Sparse Connections
• Deep Neural Network with Sparse Connections
The code, some documentation, and examples of the results of these demos can be found here. Note that these demos use the older code but any of the newer codes could be used in its place.

### Citations

If you use this code in a publication, please cite one of the following works
• Mark Schmidt. Graphical Model Structure Learning with L1-Regularization. Ph.D. Thesis, University of British Columbia, 2010.
• Mark Schmidt, Glenn Fung, Romer Rosales. Optimization Methods for L1-Regularization. UBC Technical Report TR-2009-19, 2009.
• Mark Schmidt, Glenn Fung, Romer Rosales. Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches. European Conference on Machine Learning (ECML), 2007.

### Warm-Starting and Sub-Optimization

Note that the performance of many of the methods is strongly dependent on how close the initial parameter vector x0 is to the optimal solution. Further, some methods may terminate before reaching a minimizer if they are not started sufficiently close to one.

One way to obtain a good initial guess is to use the solution for a close value of lambda. When solving for multiple values of lambda, this warm-starting strategy can often substantially reduce the number of iterations. Another way to improve performance is to only optimize over a subset of non-zero variables. This sub-optimization step may be able to make substantial progress while only computing the partial derivatives with respect to a subset of the variables. However, this requires extra coding on the part of the user as it violates the black-box principle behind the code. Both warm-starting and sub-optimization are demonstrated in the special case of logistic regression in the demo_L1Generalpath demo.

There have been several updates to the L1General software:

#### 2010

This update contains the 'L1General2' codes used in my thesis:
• Mark Schmidt. Graphical Model Structure Learning with L1-Regularization. Ph.D. Thesis, University of British Columbia, 2010 (pdf)
The methods available in L1General2 are:
• L1General2_SPG: Spectral projected gradient.
• L1General2_BBST: Barzilai-Borwein soft-threshold.
• L1General2_BBSG: Barzilai-Borwein sub-gradient.
• L1General2_OPG: Optimal projected gradient.
• L1General2_DSST: Diagonally-scaled soft-threshold.
• L1General2_OWL: Orthant-wise learning.
• L1General2_AS: Active set.
• L1General2_TMP: Two-metric projection.
• L1General2_PSSgb: Projected scaled sub-gradient (Gafni-Bertsekas variant).
• L1General2_PSSsp: Projected scaled sub-gradient (sign projection variant).
• L1General2_PSSas: Projected scaled sub-gradient (active set variant).
The L1General2 codes focus on limited-memory (L-BFGS) versions of the most effective methods from L1General, as well as other available first-order solvers.

This update also added several strategies to the older L1General code:

• L1GeneralCoordinateDescent: Added a fully greedy variant.
• L1GeneralPatternSearch: A generalization of the pattern-search method.
• L1GeneralProjectedSubGradient: A two-metric sub-gradient projection method.
• L1GeneralProjectedSubGradientBB: A projected sub-gradient method with the Barzilai-Borwein step size.
• L1GeneralCompositeGradientAccelerated: An implementation of the accelerated iterative soft-thresholding method.

#### 2008

A update of the original codes and several variants added, set up as an on-line appendix to the technical report:
• Mark Schmidt, Glenn Fung, Romer Rosales. Optimization Methods for L1-Regularization. UBC Technical Report TR-2009-19, 2009 (pdf).
The major updates and new methods added were:
• L1GeneralSubGradient: Added a variant that greedily adds a variable after each iteration.
• L1GeneralOrthantWise: An implementation of the orthant-wise method.
This version added variants of all methods that don't require the exact Hessian but instead use a BFGS approximation (this feature can be turned on by setting options.order = 1), and can still be downloaded here

#### 2007

The first web release of L1General, set up as an on-line appendix to the paper:
• Mark Schmidt, Glenn Fung, Romer Rosales. Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches. European Conference on Machine Learning (ECML), 2007 (pdf).
The methods present in this initial release were:
• L1GeneralCoordinateDescent: Cyclic and partially greedy coordinate descent methods based on an exact line search.
• L1GeneralGrafting: A active-set sub-gradient method that greedily adds a variable after each sub-optimizaiton.
• L1GeneralSubGradient: A naive scaled sub-gradient method.
• L1GeneralUnconstrainedApx: Solving a smoothed version of the problem with an unconstrained optimizer (several smoothing methods and a continuation strategy are available).
• L1GeneralIteratedRidge: A bound-optimization approach based on a scale-mixture representation of the regularizer.
• L1GeneralSequentialQuadraticProgramming: A projected Newton approach applied to a bound-contrained re-formulation.
• L1GeneralProjection: A two-metric projection approach applied to a bound-constrained re-formulation.
• L1GeneralPrimalDualLogBarrier: A primal-dual interior-point method applied to a bound-constrained re-formulation.
This older paper focused on scenarios where the Hessian matrix is available, and can still be downloaded here.

### Group L1-Regularization

If you want to solve general 'group' L1-regularization problems, several methods are available via the L1GeneralGroup and minConf packages in my thesis code.

Mark Schmidt > Software > L1General