CPSC 536H: Empirical Algorithmics (Spring 2012) Notes by Holger H. Hoos, University of British Columbia --------------------------------------------------------------------------------------------- Module 2: Deterministic decision procedures --------------------------------------------------------------------------------------------- 2.1 Deterministic decision algorithms Given: Input data (e.g., graph G and an integer, k) Objective: Output “yes” or “no” answer (e.g., to the question "is there a clique of size k in G") Other examples: - primality: given an integer n, is n a prime number? - SAT: given a propositional formula F, is there an assignment a of truth values to the variables in F such that F is true under a? - scheduling: given a set of resources R, a set of tasks T with resource requirements and a time t, can all tasks in T be accomplished within time t? Note: Formally, decision problems can be represented by the set of all "yes" instances, i.e., the set of input data for which the answer is "yes" (this set may be hard or impossible to compute) A decision algorithm is an algorithm for solving a given decision problem. A decision algorithm is called deterministic if its behaviour is completely determined by the given input data. Here: - Consider only error-free decision algorithms, i.e., algorithms that never give an incorrect answer. - Consider only algorithms that terminate on every given problem instance. - Focus on measuring performance of algorithm (typically run-time required for solving a given problem instance, but can also be consumption of other resources, in particular, memory) Note: Concepts and techniques discussed here apply to all algorithms for which we are only interested in analysing running time (or similar single resource consumed) when applied to given inputs. --- 2.2 Empirical analysis of a single decision algorithm For a given instance: - run algorithm and measure performance Issues: - choose performance measure - control execution environment (see above) For a set of problem instances: - on each instance, run algorithm and measure performance - analyse results Solution cost distributions (SCDs): - distribution of solution cost (running time) over a set of problem instances - for instances obtained form random generator: SCD = empirical probability distribution General issue: Analysing and summarising distributions Graphical representations: - CDFs (cumulative distribution functions) vs PDFs (probability density functions) - PDFs are typically preferable [Why? Ask students. Answer: ...] - use of log plots [examples in gnuplot] - modes [What do modes mean? Ask students.] - heavy (=long) vs. fat tails Def: rand var X has a heavy tail on the right, iff P(X > x) \sim x^(-\alpha) with 0 < \alpha < 2 => power-law decay of the right tail fat tail = tails fatter than in a Gaussian (=normal distr) [as measured by kurtosis] [What do fat / heavy tails mean? Ask students.] Notes: heavy-tailed distributions - have infinite mean, can have infinite variance -> instability of sample means, ... - have been used for modelling many phenomenae including network traffic and behaviour of search algorithms for NP-hard problems - are closely related to self-similar phenomenae, self-similar structures - heavy left tails?? [see also: http://en.wikipedia.org/wiki/The_long_tail, http://en.wikipedia.org/wiki/Long-range_dependency] - outliers - def: x is an outlier if it is more than 1.5 times inter-quartile range (IQR) from closest quartile, i.e., min{|x-q_0.75|,|q_0.25-x|}>= 1.5*(q_0.75-q_0.25) Descriptive statistics (summarise distribution): - location: mean, median, quantiles med for even number of samples = avg of middle two values (or rounded up) Def: p-quantile q_p of random var X = value x s.t. P(X <= x) >= p) *and* P(X >= x) <= 1-p => q_0 = min, q_1 = max estimates for sample quantiles: various algorithms (rounding, interpolation) - not much of an issue for large samples frequently used quantiles: q_0.5 = median, q_0.25, q_0.75 = quartiles quantiles are often preferable over means [Why? Ask students. Answer: stat stability] - spread: var / stddev, quantile ranges, quantile ratios quantile ranges or ratios often preferable over var / stddev - higher moments: - (sample) skewness (3rd moment) = measure of asymmetry sqrt(n) * sum_1..N(x_i - mean(x))^3 / [sum_1..N(x_i - mean(x))^2]^3/2 - (sample) kurtosis (4th moment) = measure of 'peakedness' (also reflects 'fatness' of tails) Box plot (Tukey, 1977): - box = q_0.25, q_75 (quartiles), line = median, whiskers = q_0.25-1.5*IQR, points = outliers (all of them) [Draw illustration] [see also http://web2.concordia.ca/Quality/tools/4boxplots.pdf] Note: - sometimes, extreme outliers are distinguished from mild ones, where e.o. are more than 3*IQR from closest quartile - variations of the concept exist, e.g., def of whiskers, additional indication of mean, ... --- general issues: fundamental differences between normally distributed data and other types of distributions: fraction of values within k*stddev from mean: normal distribution: 68-95-99.7 rule: k [] ][ 1 0.6827 0.3173 2 0.9545 0.0455 3 0.9973 0.0017 4 0.9999 0.0001 (note: closely related to error function = erf(n/sqrt(2)) [] k 0.8 1.2816 0.9 1.6449 0.95 1.9600 0.99 2.5758 0.999 3.2905 0.9999 3.8906 0.99999 4.4172 exp distribution Exp[1/lambda] (CDF: F(lambda) = 1-exp(-lambda*x)) k <= > 1 0.6321 0.3679 2 0.8647 0.1353 3 0.9817 0.0183 4 0.9933 0.0067 note: much higher likelihood of observing extreme values! See also - http://en.wikipedia.org/wiki/Exponential_distribution - http://en.wikipedia.org/wiki/Normal_distribution -- Example: Run-time data from a study of a heuristic MAX-CLIQUE algorithm from Franco Mascia (Univ Trento) [Show plot] Note: SCDs on hard combinatorial problems are often not normally distributed, often have very high variance, long tail(s) -> be careful when using statistical tests!! Characterisation by means of known, parametric distributions (function fitting) [discuss only briefly here] When to summarise results on benchmark sets? [Ask students.] - when instances come from a distribution (e.g., random number generator, other stochastic process) - when dealing with a large number of instances - caution: when looking at summary statistics only, it is sometimes easy to miss important effects - in particular, when summarising over heterogeneous test sets --- 2.3 Correlation between instance properties and performance Goal: analyse / characterise impact of instance properties on performance of algorithm Simple qualitative analysis: Plot correlation between given instance property and performance, one data point per instance (scatter plot) Simple quantitative analysis: standard (Pearson) correlation coefficient measures linear correlation only |r| = 1 <=> perfect linear correlation |r| = 0 <=> no correlation can use nonlinear transformations (particularly log, loglog) to test non-linear dependencies Question: When is an observed correlation statistically significant? => use statistical hypothesis test to assess significance -- Analyse scaling of performance with instance size: - measure performance for various instance sizes - exploratory analysis: - use scaling plot for inititial visual analysis (note: log / loglog plots can be very useful in this context) - fit one ore more parametric models to data points (e.g., \alpha * e^(\beta*x), a * x^b, ... using continuous optimisation technique (e.g., 'fit' function in gnuplot - caveat: local minima, divergence!) Practical tip: check all fits visually; fit multiple times with different initial values for the parameters, encouraging the optimiser to approach best values from below and above. - RMSE (= root mean squared error = rms of residuals) can be used for an initial assessment of the relative fit of various models. - confirmatory analysis: - challenge model by interpolation or extrapolation (i.e., compare predictions obtained from model against actual data - the latter data must not have been used previously for fitting the model) - use bootstrapping for checking statistical significance of differences/agreement between predictions and observations from previous step bootstrapping for scaling analysis: given performance measures for m problem instances per size, for k times: draw performance measures for l