notes on continuous optimisation (class 10/2): (mostly based on James Spall, Introduction to Stochastic Search and Optimisation, Wiley, 2003) --- motivating examples: - parameter optimisation - optimisation of engineering designs (e.g., beam dimensions in bridge design) - protein structure prediction --- generall continuous opt problem: sln comp = real numbers; s = vector in R^n; g = n-dim continuous function R^n -> R fundamental issues: - discretisation (static, adaptive) - dealing with continuous nbhd's (sampling from cont prob distr) - gradient vs. gradient-free methods - convergence against local vs. global opt --- discretisation: - Hutter et al. (ParamILS) - Schulze-Kremer (GAs for Protein Folding) --- dealing with continuous search neighbourhoods: 1. classical numerical optimisation methods: - steepest descent (corresponds to backprop for NNs) from given cand sln s, move along direction of steepest descent step: s_k+1 := s_k - a_k*g'(s_k) a_k = step size (aka gain, learning coefficient/rate) can be a constant, decaying sequence, or determined using line search by solving secondary opt probl of det a_k \in argmin_{a>=0} [g(s_k-a*g'(k)] g = eval function; g' = dg/ds = gradient of g note: - convergence to local optima only; sensitive to transformation and scaling of g - still widely used, provides the basis for many advanced methods - g' can be difficult to obtain (or approximate) - Newton-Raphson algorithm (Newton's method): idea: step size determined by 'local curvature' of g step: s_k+1 := s_k - g''(s_k)^-1*g'(s_k) g'' = d^2g/dsds^T = Hessian matrix of g note: - convergence to local opt only, but typically faster than steepest descent - perfect solution for quadratic functions (convergence to s^* in one step), but this is uncommon in practice - transform-invariant, unaffected by scaling of g - typically, good behaviour close to s^*, poor behaviour (stalling, divergence) when away from s^* -> using additional scaling coefficient a_k for Hess. matrix can help stabilise - g'' can be difficult to obtain (or approximate) -- 2. direct random search: idea: use only information on g (not g', g'') - simple random search (alg a from spall, ch.2): - choose s_0 det or uniformly at random - generate s' based on s_k by sampling from prob distrib D(s,s_k) = cont. nbh - if g(s') < g(s_k), s_k+1 := s', else s_k+1 := s_k - termination as usual D = uniform distr over S -> uniform random picking D = mult.var Gaussian -> analogue of uniform random walk (with prob. exp decaying w/ step size) ... - equivalently: localised random search (alg b from spall, ch.2) step: s' := s_k + d_k; with d_k sampled from multivariate prob distr over R^n (e.g., multivar normal) d_k should have mean 0 and stddev of each component dependent on interval size for resp. sln component. - enhanced localized random search (alg c from spall, ch.2; related to alg by Solis and Wetts, 1981) step: s' := s_k + d_k + b_k where d_k is as above, b_k is a bias vector initialised at 0 and adapted according to search progress -- [also: nonlin simplex (nelder-mead alg); not dir related to simplex for lin programming, but uses idea of convex hull; underlying fminsearch in MATLAB; see Spall, ch.2, 2.4] -- 3. some more complex sls methods: 3.1 SA (Spall, Ch.8) - as in discrete case, use acceptance Metropolis criterion - for proposal, sample from distribution over search space (or nbh), e.g., by adding multivariate Gaussian step vector (alternative: only change one component at a time, e.g., using univar. Gaussian) 3.2 EAs (Spall, Ch.9+10) - approach 1: discretisation: code real numbers into bit vectors of desired accuracy (e.g., using gray coding), rest as usual - approach 2: work directly on cont variables. example: Evol. Strategy (no recombination): - mutation by adding multivariate Gaussian step vector (Spall, p.261) - produce \lambda offspring by mutation from current pop (e.g., randomly chosen, but more complicated schemes, including recombination-like, are possible) - elitist selection from resulting old + lambda new individuals -> known as (N+lambda)-ES, where N = pop size -- 4. alternate approach: use cont opt as subroutine in hybrid SLS method (which can otherwise use, e.g., discrete steps) -> schaerf & di gaspero: application to (financial) portfolio optimisation; use quadratic program solver as subroutine ---