notes on continuous optimisation (class 10/2):
(mostly based on James Spall, Introduction to Stochastic Search and Optimisation, Wiley, 2003)
---
motivating examples:
- parameter optimisation
- optimisation of engineering designs (e.g., beam dimensions in bridge design)
- protein structure prediction
---
generall continuous opt problem:
sln comp = real numbers; s = vector in R^n; g = n-dim continuous function R^n -> R
fundamental issues:
- discretisation (static, adaptive)
- dealing with continuous nbhd's (sampling from cont prob distr)
- gradient vs. gradient-free methods
- convergence against local vs. global opt
---
discretisation:
- Hutter et al. (ParamILS)
- Schulze-Kremer (GAs for Protein Folding)
---
dealing with continuous search neighbourhoods:
1. classical numerical optimisation methods:
- steepest descent (corresponds to backprop for NNs)
from given cand sln s, move along direction of steepest descent
step: s_k+1 := s_k - a_k*g'(s_k)
a_k = step size (aka gain, learning coefficient/rate)
can be a constant, decaying sequence,
or determined using line search by solving secondary opt probl of det a_k \in argmin_{a>=0} [g(s_k-a*g'(k)]
g = eval function; g' = dg/ds = gradient of g
note:
- convergence to local optima only; sensitive to transformation and scaling of g
- still widely used, provides the basis for many advanced methods
- g' can be difficult to obtain (or approximate)
- Newton-Raphson algorithm (Newton's method):
idea: step size determined by 'local curvature' of g
step: s_k+1 := s_k - g''(s_k)^-1*g'(s_k)
g'' = d^2g/dsds^T = Hessian matrix of g
note:
- convergence to local opt only, but typically faster than steepest descent
- perfect solution for quadratic functions (convergence to s^* in one step), but this is uncommon in practice
- transform-invariant, unaffected by scaling of g
- typically, good behaviour close to s^*, poor behaviour (stalling, divergence) when away from s^*
-> using additional scaling coefficient a_k for Hess. matrix can help stabilise
- g'' can be difficult to obtain (or approximate)
--
2. direct random search:
idea: use only information on g (not g', g'')
- simple random search (alg a from spall, ch.2):
- choose s_0 det or uniformly at random
- generate s' based on s_k by sampling from prob distrib D(s,s_k) = cont. nbh
- if g(s') < g(s_k), s_k+1 := s', else s_k+1 := s_k
- termination as usual
D = uniform distr over S -> uniform random picking
D = mult.var Gaussian -> analogue of uniform random walk (with prob. exp decaying w/ step size)
...
- equivalently: localised random search (alg b from spall, ch.2)
step: s' := s_k + d_k; with d_k sampled from multivariate prob distr over R^n (e.g., multivar normal)
d_k should have mean 0 and stddev of each component dependent on interval size for resp. sln component.
- enhanced localized random search (alg c from spall, ch.2; related to alg by Solis and Wetts, 1981)
step: s' := s_k + d_k + b_k
where d_k is as above, b_k is a bias vector initialised at 0 and adapted according to search progress
--
[also: nonlin simplex (nelder-mead alg); not dir related to simplex for lin programming, but uses idea of convex hull;
underlying fminsearch in MATLAB; see Spall, ch.2, 2.4]
--
3. some more complex sls methods:
3.1 SA (Spall, Ch.8)
- as in discrete case, use acceptance Metropolis criterion
- for proposal, sample from distribution over search space (or nbh),
e.g., by adding multivariate Gaussian step vector
(alternative: only change one component at a time, e.g., using univar. Gaussian)
3.2 EAs (Spall, Ch.9+10)
- approach 1: discretisation: code real numbers into bit vectors of desired accuracy (e.g., using gray coding),
rest as usual
- approach 2: work directly on cont variables.
example: Evol. Strategy (no recombination):
- mutation by adding multivariate Gaussian step vector (Spall, p.261)
- produce \lambda offspring by mutation from current pop (e.g., randomly chosen, but
more complicated schemes, including recombination-like, are possible)
- elitist selection from resulting old + lambda new individuals
-> known as (N+lambda)-ES, where N = pop size
--
4. alternate approach: use cont opt as subroutine in hybrid SLS method (which can otherwise use, e.g.,
discrete steps)
-> schaerf & di gaspero: application to (financial) portfolio optimisation;
use quadratic program solver as subroutine
---