L1General

Mark Schmidt (2006)

L1General is a set of Matlab routines implementing several of the available strategies for solving L1-regularization problems. Specifically, they solve the problem of optimizing a differentiable function f(x) and a (weighted) sum of the absolute values of the parameters:

min_x: f(x) + sum_i v_i |x_i|

The v_i must be non-negative, but can be zero if we do not want to penalize some variables. There is a lot of interest in solving problems of this form from a variety of communities, such as signal processing and statistics. This is because the penalty on the absolute values of the variables encourages the solution to be sparse (many variables are set to exactly 0).

The absolute value function is not differentiable if a variable is zero. This makes the optimization problem harder than solving differentiable optimization problems. However, the absolute value function has a lot of nice properties, and there are a wide variety of interesting approaches for solving L1-regularization problems. Usually these focus on the case where f(x) is the linear squared error function or the logistic regression negative log-likelihood.

Instead of focusing on a specific form of f(x), the L1General software only assumes that the user can provide a 'black box' function that returns f(x) and its gradient for a given parameter setting. This is similar to optimization codes for differentiable optimization like minFunc. The purpose of writing the L1General codes was to compare the performance of different optimization strategies in this black box setting. Although treating the function as a black box means that the codes don't fully take advantage of the structure of the objective, this perspective makes it very easy to use the codes to solve a variety of different L1-regularization problems.

Usage

All the methods have a common interface:
>> x = L1General2_PSSgb(funObj,x0,lambda,options);
The input parameters are: You can replace L1General2_PSSgb with one of the other functions to use a different method. For example, you could use L1General_Projection instead. Roughly, the (newer) 'L1General2' methods only require funObj to return the function and gradient value, while the (older) 'L1General' methods also require that funObj returns the matrix of second derivatives. Alternately, the older methods use a BFGS approximation if you set options.order = 1. The list of available methods is given in the updates section of this webpage.

Download and Examples

To use the code, download and unzip L1General.zip. Then, in Matlab, type:
>> cd L1General          % Change to the unzipped directory
>> addpath(genpath(pwd)) % Add all directories to the path
>> mexAll                % Compile mex files (not necessary on all systems)
>> example_L1General     % Runs a demo of the (older) L1General codes
>> demo_L1General        % Runs a demo of the (newer) L1General codes
>> demo_L1Generalpath    % Runs a demo of computing a regularization path
The first two demos apply a variety of the different methods available to solve the same problem with the older and newer codes (respectively), pausing between running each method. The third demo shows how to embed the codes in an active-set continuation method for handling a larger number of variables and/or for regularization path calculation.

I prepared a more extensive set of examples of using L1General on the webpage Examples of using L1General. Rather than comparing the performance of different optimization methods, these demos demonstrate the 'plug and play' nature of the code. The full list of demos is:

The code, some documentation, and examples of the results of these demos can be found here. Note that these demos use the older code but any of the newer codes could be used in its place.

Citations

If you use this code in a publication, please cite one of the following works

Warm-Starting and Sub-Optimization

Note that the performance of many of the methods is strongly dependent on how close the initial parameter vector x0 is to the optimal solution. Further, some methods may terminate before reaching a minimizer if they are not started sufficiently close to one.

One way to obtain a good initial guess is to use the solution for a close value of lambda. When solving for multiple values of lambda, this warm-starting strategy can often substantially reduce the number of iterations. Another way to improve performance is to only optimize over a subset of non-zero variables. This sub-optimization step may be able to make substantial progress while only computing the partial derivatives with respect to a subset of the variables. However, this requires extra coding on the part of the user as it violates the black-box principle behind the code. Both warm-starting and sub-optimization are demonstrated in the special case of logistic regression in the demo_L1Generalpath demo.

Updates

There have been several updates to the L1General software:

2019

Updated the "mexAll.m" function to be compatible with the current versions of Matlab, by adding the "-compatibleArrayDims" flag (thanks to Hatice Ulku Osmanbeyoglu).

2010

This update contains the 'L1General2' codes used in my thesis: The methods available in L1General2 are: The L1General2 codes focus on limited-memory (L-BFGS) versions of the most effective methods from L1General, as well as other available first-order solvers.

This update also added several strategies to the older L1General code:

2008

A update of the original codes and several variants added, set up as an on-line appendix to the technical report: The major updates and new methods added were: This version added variants of all methods that don't require the exact Hessian but instead use a BFGS approximation (this feature can be turned on by setting options.order = 1), and can still be downloaded here

2007

The first web release of L1General, set up as an on-line appendix to the paper: The methods present in this initial release were: This older paper focused on scenarios where the Hessian matrix is available, and can still be downloaded here.

Group L1-Regularization

If you want to solve general 'group' L1-regularization problems, several methods are available via the L1GeneralGroup and minConf packages in my thesis code.


Mark Schmidt > Software > L1General