Empirical Algorithmics (Spring 2006) ICT International Doctorate School, Università degli Studi di Trento Notes by Holger H. Hoos, University of British Columbia --------------------------------------------------------------------------------------------------------------------------------------------------- Module 3: Randomised algorithms without error (Las Vegas algorithms) for decision problems [1.5*2h] --------------------------------------------------------------------------------------------------------------------------------------------------- 3.1 Introduction Many efficient heuristic algorithms are randomised. Examples: Simulated Annealing (SA), Genetic Algorithms (GA), routing algorithms, randomised quicksort, ... Definition: An algorithm A is called a Las Vegas Algorithm (LVA) iff - the output of A on any problem instance is guaranteed to be correct - A is randomised, i.e., for given instance (and parameter settings), the run-time of A is a random variable LVAs can be - complete: for each instance i there exists a bound on run-time t(i) required for solving i Example: randomised quicksort (randomised choice of pivoting point) - probalistically approximately incomplete (PAC): for any instance i, as run-time -> infinity, the probability of finding a (correct) solution to i approaches 1 Example: spinning a roulette wheel until obtaining result '0' (Also: many stochastic local search algorithms, e.g., SA) are PAC - essentially incomplete: there exist instances i for which as run-time -> infinity, the probability of finding a (correct) solution to i is bounded from above by some p(i) < 1 Example: many simple randomised local search algorithms Here: consider only complete and PAC algorithms (i.e., terminate with prob 1 as run-time -> infinity); essentially incomplete algorithms discussed later (module 4) --- 3.2 Run-time distributions Note: The behaviour of an LVA A on a given problem instance i is completely characterised by the probability distribution of the run-time of A on i, RT_A,i Formally: Given: Las Vegas algorithm A. - The success probability P_s (RT_A,i <= t) is the probability that A finds a solution for instance i in time <= t - The run-time distribution (RTD) of A on i is the probability distribution of the random variable RT_A,i - The run-time distribution function rtd : pos reals -> [0, 1] is defined as rtd(t) = P_s (RT_A,i <= t) RTDs form the basis for the empirical analysis of LVAs. [slides: illustrations of RTD plots] --- 3.3 Empirical analysis of single algorithm (single inst + ensembles) Protocol for obtaining the empirical RTD for an LVA A applied to a given instance i of a decision problem [-> slide] Note: Obtaining sufficiently stable descriptive statistics requires the same number of runs of the given algorithm as measuring reasonably accurate empirical RTDs. The RTD of a given LVA applied to a single problem instance can be analysed using the same techniques discussed for SCDs (module 2) - RTD graphs, descriptive stats, box plots. -- Basic quantitative analysis for ensembles of instances: - In principle, the same approach as for individual instances is applicable: Measure empirical RTD for each instance, analyse using RTD plots or descriptive statistics. - In many cases, the RTDs for set of instances have similar shapes or share important features (e.g., being uni- or bi-modal, or having a prominent right tail). - Select typical instance for presentation or further analysis, briefly summarise data for remaining instances. - For bigger sets of instances (e.g., samples from random instance distributions), it is important to characterise the performance of the given algorithm on individual instances as well as across the entire ensemble. - Report and analyse run-time distributions on representative instance(s) as well as search cost distribution (SCD), i.e., distribution of basic RTD statistics (e.g., median or mean) across given instance ensemble. [-> slide] - For the analysis of SCDs, use the same techniques as discussed in module 2. - In particular, for sets of instances that have been generated by systematically varying a parameter (e.g., problem size), study RTD characteristics in dependence of the parameter value. -- General issue: Characterising distributions using parametric distribution functions Here: applied to RTDs - Empirical RTDs are step functions that approximate the underlying theoretical RTDs. - For reasonably large sample sizes (numbers of runs), empirical RTDs can often be approximated well using much simpler continuous mathematical functions. - Such functional approximations are useful for summarising and mathematically modelling empirically observed behaviour, which often provides deeper insights into LVA behaviour. - Approximations with parameterised families of continuous distribution functions known from statistics, such as exponential or normal distributions, are particularly useful. - Model fitting techniques, such as the Marquardt-Levenberg or Expectation Maximisation algorithms, can be used to find good approximations of empirical RTDs with parameterised cumulative distribution functions. (To do this in practice, use, e.g., 'fit' command in gnuplot.) [-> slide] - The quality of approximations can be assessed using statistical goodness-of-fit tests, such as the Kolmogorov-Smirnov goodness-of-fit test (discussed later). [-> slide] - This approach can be easily generalised to ensembles of problem instances. Note: - Particularly for small or easy problem instances, the quality of optimal functional approximations can sometimes be limited by the inherently discrete nature of empirical RTD data. - Fitting more complexs functions can be tricky (due to limitations of continuous optimisation techniques underlying model fitting algorithms) [-> slide] -- Stagnation, parallelisation and restart strategies - Detailed RTD analyses can often suggest ways of improving the performance of a given SLS algorithm. - Static restarting, i.e., periodic re-initialisation after all integer multiples of a given cutoff-time t', is one of the simplest methods for overcoming stagnation behaviour. - A static restart strategy is effective, i.e., leads to increased solution probability for some run-time t'', if the RTD of the given algorithm and problem instance is less steep than an exponential distribution crossing the RTD at some time t < t''. [-> slide] To determine the optimal cutoff-time topt for static restarts, consider the left-most exponential distribution that touches the given empirical RTD and choose topt to be the smallest t value at which the two respective distribution curves meet. (For a formal derivation of topt , see page 193 of SLS:FA.) Note: This method for determining optimal cutoff-times only works a posteriori, given an empirical RTD. Optimal cutoff-times for static restarting typically vary considerably between problem instances; -> use dynamic restart strategies or other algorithmic techniques to overcome stagnation behaviour (see, e.g., Ch.4 of SLS:FA) Multiple independent runs parallelisation - Any LVA A can be easily parallelised by performing multiple runs on the same problem instance i in parallel on p processors. - The effectiveness of this approach depends on the RTD of A on i: Optimal parallelisation speedup of p is achieved for an exponential RTD. - The RTDs of many high-performance stochastic local search algorithms are well approximated by exponential distributions; however, deviations for short run-times (due to the effects of search initialisation) limit the maximal number of processors for which optimal speedup can be achieved in practice. --- 3.4 Comparative analysis of LVA algorithms (single inst + ensembles) Question: Given two LVAs A and B, is A performing better than B on a given problem instance i? [ask students about challenge] Def. Probabilistic domination Algorithm A probabilistically dominates algorithm B on problem instance i, iff (1) \forall t: P(RT_A <= t) >= P(RT_B <= t) (2) \exist t: P(RT_A <= t) > P(RT_B <= t) Graphical criterion: RTD of A is 'above' that of B (in CDF plot) Situations where there is no prob domination between A and B are reflected by crossing RTD curves (in CDF plots). [draw figure] [ask students about meaning of crossing RTDs] --- general issue: comparing the medians of two distributions for significant differences Mann-Whitney U-test = Wilcoxon rank sum test (alternative to two-sample t-test) given: two samples, A,B H_0: med_1 = med_2 computation of U statistic: 1. label and rank pooled obs from both samples (break ties in favor of sample A) 2. inspect each B sample and count number of A's preceeding it -> U_A 3. inspect each A sample and count number of B's preceeding it -> U_B 4. U := min{U_A,U_B} Note: does not require normality assumption (for normal distributions - very unusual for RTDs - it is preferable to use the two-sample t-test) [-> http://geographyfieldwork.com/Mann%Whitney.htm] in R: wilcox.test(rtdA$V2,rtdB$V2,paired=FALSE) -- general issue: comparing two distributions for equality: (this arises, e.g., when analysing whether changinging a parameter value has any impact on a given LVA) Kolmogorov-Smirnov goodness-of-fit test given: two samples H_0: same underlying distribution statistic: D = max of vertical dist between emp CDFs note: restricted to continuous distributions; less sensitive than t-test when the latter's assumptions are satisfied, not affected by changes of scale or log transformation of data; can also be used for one sample vs model [-> http://www.physics.csbsju.edu/stats/KS-test.html; http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm] in R: ks.test(rtdA$V2,rtdB$V2) Note: does not make any assumptions about the type or shape of the distribution -- Goal: Compare performance of Las Vegas algorithms A and B on a given ensemble of instances. - Use instance-based analysis to partition given ensemble into three subsets: - instances on which A probabilistically dominates B; - instances on which B probabilistically dominates A; - instances on which there is no probabilistic domination between A and B (crossing RTDs). The size of these subsets gives a rather detailed picture of the algorithms’ relative performance on the given ensemble. Use statistical tests to assess significance of performance differences across given instance ensemble. - Use Wilcoxon matched pairs signed-rank test (see module 2) on paired medians of RTDs (or, equivalently, binomial sign test) - Note: This does not capture qualitative performance differences such as different shapes of the underlying RTDs and can easily miss interesting variation in relative performance across the ensemble. Particularly for large instance ensembles, it is often useful to study the correlation between the performance of A and B across the ensemble. -> Use RTD statistics and methods from module 2 [ask students for some details -> scatter plots, correlation coefficients,...] --- learning goals (for module 3): - be able to explain the concept and definition of a Las Vegas algorithm - be able to measure and analyse RTDs of LVAs on single and multiple problem instances - know how the concepts of RTDs and SCDs are related - be able to characterise RTDs by fitting known parametric distributions and to evaluate quality of fit - be able to explain how the analysis of RTDs can help to improve the performance of an LVA - be able to explain the connection between the effectiveness of parallelising a given LVA and the shape of its RTDs - be able to explain the concept and definition of probabilistic domination and its application in the comparative analysis of LVAs - be able to name and use appropriate tests for detecting significant performance differences of LVAs on single and multiple problem instances - be able to explain how performance correlations between LVAs can be analysed