Empirical Algorithmics (Spring 2006)
ICT International Doctorate School, Universit&agrave; degli Studi di Trento
Notes by Holger H. Hoos, University of British Columbia

---------------------------------------------------------------------------------------------------------------------------------------------------
Module 3: Randomised algorithms without error (Las Vegas algorithms) for decision problems [1.5*2h]
---------------------------------------------------------------------------------------------------------------------------------------------------

3.1 Introduction

Many efficient heuristic algorithms are randomised.
Examples: Simulated Annealing (SA), Genetic Algorithms (GA), routing algorithms, randomised quicksort, ...

Definition: An algorithm A is called a Las Vegas Algorithm (LVA) iff
- the output of A on any problem instance is guaranteed to be correct
- A is randomised, i.e., for given instance (and parameter settings),
	the run-time of A is a random variable

LVAs can be
- complete: for each instance i there exists a bound on run-time t(i) required for solving i
	Example: randomised quicksort (randomised choice of pivoting point)
- probalistically approximately incomplete (PAC): for any instance i, as run-time -> infinity, 
	the probability of  finding a (correct) solution to i approaches 1
	Example: spinning a roulette wheel until obtaining result '0' 
	(Also: many stochastic local search algorithms, e.g., SA) are PAC
- essentially incomplete: there exist instances i for which as run-time -> infinity,
	the probability of  finding a (correct) solution to i is bounded from above by some p(i) < 1
	Example: many simple randomised local search algorithms

Here: consider only complete and PAC algorithms (i.e., terminate with prob 1 as run-time -> infinity);
	essentially incomplete algorithms discussed later (module 4)


---
3.2 Run-time distributions

Note: The behaviour of an LVA A on a given problem instance i is completely characterised by
	the probability distribution of the run-time of A on i, RT_A,i

Formally:

Given: Las Vegas algorithm A.
- The success probability P_s (RT_A,i <= t) is the probability that
A finds a solution for instance i in time <= t
- The run-time distribution (RTD) of A on i is the probability
distribution of the random variable RT_A,i
- The run-time distribution function rtd : pos reals -> [0, 1]
is defined as rtd(t) = P_s (RT_A,i <= t) 

RTDs form the basis for the empirical analysis of LVAs.

[slides: illustrations of RTD plots]


---
3.3 Empirical analysis of single algorithm (single inst + ensembles)

Protocol for obtaining the empirical RTD for an LVA A applied
to a given instance i of a decision problem

[-> slide]

Note:
Obtaining sufficiently stable descriptive statistics requires
the same number of runs of the given algorithm as measuring
reasonably accurate empirical RTDs.

The RTD of a given LVA applied to a single problem instance
can be analysed using the same techniques discussed for SCDs
(module 2) - RTD graphs, descriptive stats, box plots.

--
Basic quantitative analysis for ensembles of instances:

- In principle, the same approach as for individual instances
is applicable: Measure empirical RTD for each instance,
analyse using RTD plots or descriptive statistics.

- In many cases, the RTDs for set of instances have similar
shapes or share important features (e.g., being uni- or
bi-modal, or having a prominent right tail).

-  Select typical instance for presentation or further analysis,
briefly summarise data for remaining instances.

- For bigger sets of instances (e.g., samples from random
instance distributions), it is important to characterise the
performance of the given algorithm on individual instances
as well as across the entire ensemble.

-  Report and analyse run-time distributions on representative
instance(s) as well as search cost distribution (SCD), i.e.,
distribution of basic RTD statistics (e.g., median or mean)
across given instance ensemble.

[-> slide]

- For the analysis of SCDs, use the same techniques 
as discussed in module 2.

- In particular, for sets of instances that have been generated by
systematically varying a parameter (e.g., problem size), study
RTD characteristics in dependence of the parameter value.


--
General issue: Characterising distributions using parametric distribution functions 

Here: applied to RTDs

- Empirical RTDs are step functions that approximate
the underlying theoretical RTDs.

- For reasonably large sample sizes (numbers of runs),
empirical RTDs can often be approximated well using
much simpler continuous mathematical functions.

- Such functional approximations are useful for summarising
and mathematically modelling empirically observed behaviour,
which often provides deeper insights into LVA behaviour.

- Approximations with parameterised families of continuous
distribution functions known from statistics, such as
exponential or normal distributions, are particularly useful.

- Model fitting techniques, such as the Marquardt-Levenberg
or Expectation Maximisation algorithms, can be used to find
good approximations of empirical RTDs with parameterised
cumulative distribution functions. 
(To do this in practice, use, e.g., 'fit' command in gnuplot.)

[-> slide]

- The quality of approximations can be assessed using
statistical goodness-of-fit tests, such as the 
Kolmogorov-Smirnov goodness-of-fit test (discussed later).

[-> slide]

- This approach can be easily generalised to ensembles of
problem instances.

Note: 
- Particularly for small or easy problem instances,
the quality of optimal functional approximations can
sometimes be limited by the inherently discrete nature
of empirical RTD data.
- Fitting more complexs functions can be tricky
(due to limitations of continuous optimisation techniques
underlying model fitting algorithms)

[-> slide]
--

Stagnation, parallelisation and restart strategies

- Detailed RTD analyses can often suggest ways of improving
the performance of a given SLS algorithm.

- Static restarting, i.e., periodic re-initialisation after all integer
multiples of a given cutoff-time t', is one of the simplest
methods for overcoming stagnation behaviour.

- A static restart strategy is effective, i.e., leads to increased
solution probability for some run-time t'', if the RTD of
the given algorithm and problem instance is less steep than
an exponential distribution crossing the RTD at some time
t < t''.

[-> slide]

To determine the optimal cutoff-time topt for static restarts,
consider the left-most exponential distribution that touches
the given empirical RTD and choose topt to be the smallest
t value at which the two respective distribution curves meet.
(For a formal derivation of topt , see page 193 of SLS:FA.)

Note: This method for determining optimal cutoff-times
only works a posteriori, given an empirical RTD.

Optimal cutoff-times for static restarting typically vary
considerably between problem instances;

-> use dynamic restart strategies or other algorithmic techniques to
overcome stagnation behaviour (see, e.g., Ch.4 of SLS:FA)


Multiple independent runs parallelisation
- Any LVA A can be easily parallelised by performing multiple
runs on the same problem instance i in parallel on p
processors.

- The effectiveness of this approach depends on the RTD
of A on i:
Optimal parallelisation speedup of p is achieved for
an exponential RTD.

- The RTDs of many high-performance stochastic local
search algorithms are well approximated by exponential distributions; 
however, deviations for short run-times (due to the effects of search
initialisation) limit the maximal number of processors
for which optimal speedup can be achieved in practice.


---
3.4 Comparative analysis of LVA algorithms (single inst + ensembles)

Question: Given two LVAs A and B, is A performing better than B 
	on a given problem instance i?

[ask students about challenge]

Def. Probabilistic domination

Algorithm A probabilistically dominates algorithm B
on problem instance i, iff

(1) \forall t: P(RT_A <= t) >= P(RT_B <= t)
(2) \exist t: P(RT_A <= t) > P(RT_B <= t)

Graphical criterion: RTD of A is 'above' that of B (in CDF plot)

Situations where there is no prob domination between A and B 
are reflected by crossing RTD curves (in CDF plots).

[draw figure]

[ask students about meaning of crossing RTDs]


---
general issue: comparing the medians of two distributions for significant differences

Mann-Whitney U-test = Wilcoxon rank sum test
(alternative to two-sample t-test)

given: two samples, A,B
H_0: med_1 = med_2
computation of U statistic: 
  1. label and rank pooled obs from both samples
    (break ties in favor of sample A)
  2. inspect each B sample and count number of A's preceeding it -> U_A
  3. inspect each A sample and count number of B's preceeding it -> U_B
  4. U := min{U_A,U_B}

Note: does not require normality assumption 
	(for normal distributions - very unusual for RTDs - it is preferable to use 
	the two-sample t-test)

[-> http://geographyfieldwork.com/Mann%Whitney.htm]

in R: wilcox.test(rtdA$V2,rtdB$V2,paired=FALSE)

--
general issue: comparing two distributions for equality:

(this arises, e.g., when analysing whether changinging a parameter value has
any impact on a given LVA)

Kolmogorov-Smirnov goodness-of-fit test
given: two samples
H_0: same underlying distribution
statistic: D = max of vertical dist between emp CDFs
note: restricted to continuous distributions;
	less sensitive than t-test when the latter's assumptions are satisfied,
	not affected by changes of scale or log transformation of data;
	can also be used for one sample vs model
[-> http://www.physics.csbsju.edu/stats/KS-test.html;
    http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm]

in R: ks.test(rtdA$V2,rtdB$V2)

Note: does not make any assumptions about the type or shape of the distribution

--
Goal: Compare performance of Las Vegas algorithms A and B
on a given ensemble of instances.

- Use instance-based analysis to partition given ensemble into
three subsets:
  - instances on which A probabilistically dominates B;
  - instances on which B probabilistically dominates A;
  - instances on which there is no probabilistic domination
	between A and B (crossing RTDs).

The size of these subsets gives a rather detailed picture of
the algorithms’ relative performance on the given ensemble.

Use statistical tests to assess significance of performance
differences across given instance ensemble.
- Use Wilcoxon matched pairs signed-rank test (see module 2) on paired medians of RTDs
	(or, equivalently, binomial sign test)
- Note: This does not capture qualitative performance
differences such as different shapes of the underlying RTDs
and can easily miss interesting variation in relative
performance across the ensemble.

Particularly for large instance ensembles, it is often useful to
study the correlation between the performance of A and B
across the ensemble.
-> Use RTD statistics and methods from module 2

[ask students for some details -> scatter plots, correlation coefficients,...]


---
learning goals (for module 3):
- be able to explain the concept and definition of a Las Vegas algorithm
- be able to measure and analyse RTDs of LVAs on single and multiple problem instances
- know how the concepts of RTDs and SCDs are related
- be able to characterise RTDs by fitting known parametric distributions
	and to evaluate quality of fit
- be able to explain how the analysis of RTDs can help to improve the performance
	of an LVA
- be able to explain the connection between the effectiveness of parallelising a given LVA
	and the shape of its RTDs
- be able to explain the concept and definition of probabilistic domination
	and its application in the comparative analysis of LVAs
- be able to name and use appropriate tests for detecting significant performance differences
	of LVAs on single and multiple problem instances
- be able to explain how performance correlations between LVAs can be analysed

<eof>