*minFunc* is a Matlab function for unconstrained optimization of differentiable
real-valued multivariate functions using line-search methods. It uses an
interface very similar to the Matlab Optimization Toolbox function *fminunc*,
and can be called as a replacement for this function.
On many problems, *minFunc* requires fewer function evaluations to converge than
*fminunc* (or *minimize.m*). Further it
can optimize problems with a much larger number of variables (*fminunc* is
restricted to several thousand variables), and uses
a line search that is robust to several common function pathologies.

The default parameters of *minFunc*
call a quasi-Newton strategy, where limited-memory BFGS updates with Shanno-Phua scaling are used in
computing the step direction, and a bracketing line-search for a point
satisfying the strong Wolfe conditions is used to compute the step direction. In the
line search, (safeguarded) cubic
interpolation is used to generate trial values, and the method switches to an
Armijo back-tracking line search on iterations where the objective function
enters a region where the
parameters do not produce a real valued output (i.e. complex, NaN, or Inf).

- Step directions can be computed based on: Exact Newton (requires user-supplied Hessian), full quasi-Newton approximation (uses a dense Hessian approximation), limited-memory BFGS (uses a low-rank Hessian approximation - default), (preconditioned) Hessian-free Newton (uses Hessian-vector products), (preconditioned) conjugate gradient (uses only previous step and a vector beta), Barzilai and Borwein (uses only previous step), or (cyclic) steepest descent.
- Step lengths can be computed based on either the (non-monotone) Armijo or Wolfe conditions, and trial values can be generated by either backtracking/bisection, or polynomial interpolation. Several strategies are available for selecting the initial trial value.
- Numerical differentiation and derivative checking are available, including an option for automatic differentiation using complex-step differentials (if the objective function code handles complex inputs).
- Most methods have user-modifiable parameters, such as the number of corrections to store for L-BFGS, modification options for Hessian matrices that are not positive-definite in the pure Newton method, choice of preconditioning and Hessian-vector product functions for the Hessian-free Newton method, choice of update method;scaling;preconditioning for the non-linear conjugate gradient method, the type of Hessian approximation to use in the quasi-Newton iteration, number of steps to look back for the non-monotone Armijo condition, the parameters of the line search algorithm, the parameters of the termination criteria, etc.

The function 'example_minFunc' gives an example of running the various limited-memory solvers in minFunc with default options on the 2D Rosenbrock "banana" function (it also runs minimize.m if it is found on the path). To run the example, in Matlab type:

Running the example should produce the following output:>> cd minFunc_2012 % Change to the unzipped directory >> addpath(genpath(pwd)) % Add all sub-directories to the path >> mexAll % Compile mex files (not necessary on all systems) >> example_minFunc % Run a demo trying to minimize the function

Result after 25 evaluations of limited-memory solvers on 2D rosenbrock: --------------------------------------- x1 = 0.0000, x2 = 0.0000 (starting point) x1 = 1.0000, x2 = 1.0000 (optimal solution) --------------------------------------- x1 = 0.8725, x2 = 0.7569 (minimize.m by C. Rasmussen) x1 = 0.3654, x2 = 0.1230 (minFunc with steepest descent) x1 = 0.4974, x2 = 0.2452 (minFunc with cyclic steepest descent) x1 = 0.8756, x2 = 0.7661 (minFunc with spectral gradient descent) x1 = 0.5840, x2 = 0.3169 (minFunc with Hessian-free Newton) x1 = 0.7478, x2 = 0.5559 (minFunc with preconditioned Hessian-free Newton) x1 = 1.0010, x2 = 1.0020 (minFunc with conjugate gradient) x1 = 0.7907, x2 = 0.6256 (minFunc with scaled conjugate gradient) x1 = 0.9794, x2 = 0.9491 (minFunc with preconditioned conjugate gradient) x1 = 1.0000, x2 = 1.0000 (minFunc with limited-memory BFGS - default)A note on the above performance results: The default parameters of minFunc were tuned based on log-linear statistical models (not on standard optimization problems like Rosenbrock's function).

To illustrate the use of Hessian-vector products and preconditioning, running *example_minFunc_LR* will show how to train a logistic regression classifier under various configurations of the Hessian-vector product and preconditioning function.

A very useful function that comes with the minFunc package is the *derivativeCheck* function that numerically checks the user-supplied gradient and/or Hessian, as well as the *fastDerivativeCheck* function which is more suitable for high-dimensional problems. Running *example_derivativeCheck* will show how to call these functions.

I have also prepared a more extensive set of examples of using minFunc on the webpage Examples of using minFunc.

- M. Schmidt.
**minFunc: unconstrained differentiable multivariate optimization in Matlab**. http://www.cs.ubc.ca/~schmidtm/Software/minFunc.html, 2005.

- A version of the
*mexAll*function for Octave can be downloaded here. - A version of the
*conjGrad*function that fixes the function header can be downloaded here.

- The L-BFGS code was completely re-written to be faster, to require less memory, and to be more stack-friendly.
- Excised the derivative checking code into a new function,
*derivativeCheck*, that can be called directly to check the user-supplied gradient and/or Hessian. This function now also uses central-differencing by default, in order to return higher-accuracy results. Also added the*fastDerivativeCheck*function, which uses a a random direction to quickly test the quality of the user-supplied gradient using only two or three function evaluations (thanks to Nicolas Le Roux for this suggestion). This is useful for high-dimensional problems where many function evaluations would be costly, but it has the disadvantage that it does not return an approximation of the gradient in order to help debugging the gradient code. - The
*LS*option was replaced by the*LS_type*and*LS_interp*options. The*LS_type*option specifies whether the Armijo or the Wolfe line-search will be used, and the*LS_interp*function specifies the type of interpolation/extrapolation strategy. - Added the
*LS_multi*option. If this is turned on then the Armijo line-search will use more than two points to build a higher-order interpolating function (quadratic, cubic, quartic, or quintic depending on the situation). - Changed the names of the confusing-to-me
*tolX*and*tolFun*variables to the more-simple-to-me*optTol*and*progTol*. The*optTol*variable specifies a first-order optimality tolerance, and the*progTol*variable specifies a lack-of-progress tolerance. - The extrapolation procedure was modified to suppress the (harmless) condition-number warning message than many users encountered.
- Fixed a problem where the Wolfe line-search could prematurely detect a lack of progress.
- The user callback function can now tell minFunc to stop.
- The 64-bit windows mex files were added, thanks to Rainer Heintzmann for these files.

- Removed bounding of the initial step length in the line search (except on the first iteration). This sometimes caused slow convergence for functions with large gradient values.
- Changed the optimality tolerance parameters; tolFun is now only used to measure first-order optimality conditions, while tolX is now used for measures of lack of progress.
- Added the 'cyclic' steepest descent method, which performs an accurate line search, but then performs several iterations with the same step size under an inaccurate line search.
- To prevent the non-linear conjugate gradient method from restarting so often, this method was modified to accept the conjugate gradient step whenever a sufficient decrease condition is satisfied.
- Added the 'scaled' conjugate gradient method, where a Hessian-vector product is used to give a good initialization of the line search.
- Added the option to use the eigenvector associated with the most negative eigenvalue as a negative curvature direction, as an alternative to Hessian modification strategies.
- When the linear conjugate gradient method terminates because of negative curvature in the newton0 method, the option is now available to use the negative curvature direction as the search direction (now the default).
- Added a preconditioned non-linear conjugate gradient method.
- Added the ability to call user-specified preconditioners.
- Changed the name of the 'bfgs' method to 'qnewton', as there are now a bunch of Hessian approximations available via the 'qnUpdate' option: BFGS updates, SR1 updates (falls back on BFGS if it doesn't maintain positive-definiteness), Hoshino updates, Self-Scaling BFGS (now the default), Oren's Self Scaling Variable Metric Method, and the McCormick-Huang asymmetric update.
- Removed the constrained optimization codes. For constrained optimization, see minConf.
- The 64-bit linux mex file 'lbfgsC.mexa64' was added. Thanks to Emt Khan for this file.
- The 64-bit windows mex files 'lbfgsC.mexw64' and 'mcholC.mexw64' were added, as well as an updated version of 'mcholC.c' that allows this file to be compiled under 64-Windows machines. Thanks to Rainer Heintzmann for these files.
- The mex files 'lbfgsC.mexmaci', 'lbfgsC.mexmaci64', and 'mcholC.mexmaci64' were added. Thanks to Jascha Sohl-Dickstein for these files.

- Input options are now case insensitive.
- The Wolfe line search now switches to an Armijo line search if the function returns an illegal (complex, NaN, or Inf) value (previously, this was one of the biggest causes of problems and you had to make the switch manually for any function that you suspected might overflow or have other problems).
- The Wolfe line search is now more robust (thanks to Glenn Fung, Mirela Andronescu, and Frank Hutter for helping me find some pathological cases).
- Added the ability to have user-specified Hessian-vector product functions in the Hessian-Free Newton method.
- Added a mex version of the LDL^T factorization (often the fastest Hessian modification strategy).
- Made finite-differencing the default numerical differentiation technique (this is less accurate but more broadly applicable than the complex-step derivative approach).
- Added a Tensor method that uses a Tensor of 3rd derivatives to potentially give improved convergence (this is mainly for entertainment purposes since I can't imagine it being practically useful).
- Added three constrained optimization codes that use some of minFunc's functionality to solve constrained optimization problems. 'minFuncBC' solves problems with (upper/lower) bounds on the variables, 'minFuncEC' solves problems with linear equality constraints, and 'minFuncIC' solves problems with linear inequality constraints (this one is not as efficient).

- The user-supplied gradient code is wrong. It is recommended to use
the
*derivativeCheck*options (or function) to make sure that the gradient code roughly matches numerically computed derivatives. - minFunc runs out of memory. By default, minFunc uses a large number of corrections in the L-BFGS method. For problems with a very large number of variables, the 'Corr' parameter should be decreased (i.e. if you run out of memory on the 10th iteration, try setting 'Corr' to 9 or lower).
- minFunc returns before a solution of satisfactory accuracy is achieved. In this case, decreasing optTol and/or progTol will typically fix the problem, but in some cases the objective function or variables need to be re-scaled.
- Each iteration requires many function evaluations. In this case, changing the line search initialization (options.LS_init) from its default value can sometimes improve performance (i.e. setting it to 2 will use a quadratic approximation to initialize the line search).
- You get the error "??? Undefined function or method 'lbfgsAddC'". This occurs if
Matlab can't find the mex file for your operating system. If this occurs, you can either: (i) set 'options.useMex' to 0, or (ii) compile the mex file for your operating system by running the
*mexAll*function. If you end up compiling a file in minFunc for your operating system, please send it to me and I will add to the distribution. - The objective function is not appropriate (this is a modeling issue, not an optimization problem). You need to make sure that a minimum of the function is actually the result you want, and that you don't need constraints.
- When using the pure Newton method, the default options in minFunc assume that the Hessian is positive-definite or close to positive-definite. If this is not the case, the 'hessianModify' option may need to be changed (to 1, 2, or 3, for example) in order to obtain good performance.

I have also written trust-region, derivative-free, and coordinate descent codes. In some scenarios, the second-order trust-region code requires fewer function evaluations than minFunc (based on line-searches), while some of the derivative-free methods can be applied to problems that are continuous but not necessarily differentiable (i.e. if the function uses absolute values). The coordinate descent method can use additional structure present in the problem to (for very structured problems) give better performance. These methods are not available in the on-line package, contact me if you want these additional methods.

myObj = @(x)f(y,x,arg1,arg2); minFunc(myObj,x_init);

Mark Schmidt > Software > minFunc