CPSC 536H: Empirical Algorithmics (Spring 2012) Notes by Holger H. Hoos, University of British Columbia --------------------------------------------------------------------------------------- Module 6: General considerations for empirical design methods --------------------------------------------------------------------------------------- 6.1 What makes a good algorithm (for solving a given problem)? Correctness: need to have correct behaviour -> *testing and debugging* what about algorithms with error? randomised algorithms? parallel algorithms? Performance: avg/peak/robustness=consistency (outliers) have already discussed how to assess, but how do we achieve good performance? -> modules 7-10 (various method) Reproducibility of behaviour: - dependence on execution environment (OS platform, machine, ...) - already covered in module 1 - randomness: is it a problem? *random number generators / seeds* will see benefits in module 8 Configurability / parameters: many = good (or perhaps bad?) Ease of implementation, evolvability & reusability initial implementation vs modifications vs reuse -> software engineering topics, but gain special importance in the context of automated design methods Theoretical guarantees - covered (to some extent) in module 10 Note: - typically, no single best algorithm for given problem -> parameters and configuration (module 7) algorithm selection (module 9) portfolios (module 10) - generic improvements (i.e., ones independent of the details of the problem or algorithm) often possible - this is what we focus on in the following - we are interested in principled, automated design methods -> algorithms (fully formalised, automated procedures) that create, transform, analyse algorithms (target algorithms) -> meta-algorithmics - the study of such meta-algorithms --- 6.2 Ease of implementation, evolvability & reusability - how to measure? [ask students] person hours? description length? ... - rapid prototyping, modularity, algorithm frameworks - Occam's razor: conceptual simplicity vs performance - automated algorithm design - code bloat, optimisation for simplicity quantified tradeoff between performance, complexity 6.3 Some notes on testing & debugging Goal: ascertain/improve correctness of algorithm / implementation using empirical methods (as opposed to mathematical proofs of correctness) To achieve this: iteratively detect & remedy incorrect behaviour (bugs) Debugging = the process of identifying and correcting the root cause of a failure [Zeller, 2000] -- Fuzz testing (or fuzzing) [see http://en.wikipedia.org/wiki/Fuzz_testing, http://pages.cs.wisc.edu/~bart/fuzz/]: key idea: automatically find circumstances under which programs fail (bugs) by providing automatically constructed, random(ised) input data. realisation: software tool (fuzzer) iteratively constructs inputs and runs given program, recording result (run failed or successful) widely used in industry, most common targets: file formats and protocols, but applicable for all types of program input enhances software safety & security oldest and simplest form (Miller et al., 1990): input = random stream of bits more advanced: - (random) transformations (= mutations) of existing inputs (from test suites, real-life), e.g.: by flipping bits at random, moving/swapping chunks (such as blocks of files) - grammar-based generation or transformation of input - white box fuzzing http://research.microsoft.com/en-us/projects/atg/ndss2008.pdf], evolutionary fuzzing [see http://www.vdalabs.com/tools/efs_gpf.html] can focus on valid, invalid and mostly-valid inputs The first paper on fuzz testing: Miller, L. Fredriksen, and B. So, "An Empirical Study of the Reliability of UNIX Utilities", Communications of the ACM 33, 12 (December 1990). (originated from a grad course project in 1988!) Further information/literature: http://pages.cs.wisc.edu/~bart/fuzz/ -- Delta debugging: [see http://www.st.cs.uni-saarland.de/dd/] [material here mostly based on Andreas Zeller: From Automated Testing to Automated Debugging (2000)] note [Zeller, 2000]: "debugging is experimental science: noticing something, one wonders why it happens, and one sets up a number of hypotheses which one confirms or refutes by means of experiments." -> scientific method! goal: automate the scientific method of debugging key idea: isolate failure causes (= failure-inducing circumstances) automatically, by systematically narrowing down failure-inducing circumstances until minimal set remains (minimal: cannot be reduced any further) -> delta debugging algorithm applications: isolation of failure-inducing - program input (e.g., HTML pages that causes failure of web browser, CNF formulae for which a SAT solver gives an incorrect answer, ...) - user interaction (e.g., keystrokes that cause a program crash) - changes to the program code (e.g., after a failing regression test). the delta-debugging (meta-)algorithm for simplifying failure-inducing input: given: - program P to be executed (e.g., web browser) - (observed) failure F (e.g., browser crashes) and corresponding input I - test function T: determines whether P executed on some given input (e.g., on given HTML page) produces F, result: failure occurs (x), doesn't occur (ok), unresolved (?) want: - minimal set of of failure-inducing circumstances (which provide insight into nature/cause of failure) (e.g., single HTML tag that causes the browser to crash) how it works (key ideas): - iteratively simplify input I such that failure F persists - in each step: I' := I with some part removed if T(P,I) = x: I := I' (i.e., continue with reduced I) - start removing large chunks, then gradually reduce size of chunks removed (depending on circumstances, down to single characters) - e.g., geometric progression (1/2, 1/4, ...) variations of this procedure can be used to narrow down failure-inducing differences between a faulty run R_x (or version of P) and a working run R_ok (or version of P) => find faulty run and working run with *minimal difference* (makes it easy to detect / eliminate root cause) [see Figure 3 from paper: X ---> | | <--- ok] three approaches - modify R_x to be closer to R_ok - modify R_ok to be closer to R_x - do both (interleave) => dd algorithm, see paper Literature: - Andreas Zeller: From Automated Testing to Automated Debugging, http://www.infosun.fim.uni-passau.de/st/papers/computer2000/computer.pdf, 2000. - Yesterday, my program worked. Today, it does not. Why? Andreas Zeller; Proc. ESEC/FSE 99, Toulouse, France, September 1999, Vol. 1687 of LNCS, pp. 253-267. - Andreas Zeller: WHY PROGRAMS FAIL: A Guide to Systematic Debugging. Morgan Kaufmann, 2nd edition, 2009. Further information: http://www.st.cs.uni-saarland.de/dd/ --- 6.4 Randomisation and pseudo-random number generation As previously discussed, many algorithms use randomisation => increased performance robustness, chance of good performance (at risk of bad performance) can be exploited (restarts / parallelisation - module 8) Random number sources are also important in cryptography, simulation of stochastic systems, computer art (e.g., music, film), games, gambling etc. [briefly discuss: randomised=probabilistic vs non-deterministic computation - compare, e.g., respective Turing machine models] Standard computer hardware does not provide source of randomness => random numbers typically generated in software - pseudo-random number generators (PRNGs) Note: there are actual random number sources, e.g., random.org (based on atmospheric noise), special devices (based on various physical phenomenae) PNRG: procedure that, starting from a seed number generates a sequence of pseudo-random numbers, i.e., sequence of numbers (typically integers) that resembles a stream of independent samples from a uniform distribution Advantage: Use of PRNGs implies deterministic computation (given seed) -> reproducibility! (but: need to fix/store seeds, beware of OS/compiler dependency of PNRG) Disadvantage: Deviations from true randomness can cause abnormal/undesired behaviour -- What makes a good random number sequence? 1. uniform and unbiased, i.e., equal fractions of generated numbers should fall into equal intervals 2. serially uncorrelated, i.e, n-tuples from the sequence should be independent of one another 3. long period; while ideally, the generator should not cycle, in practice, repetition should occur only after a very large set of numbers has been generated [ask students: why would a PRNG cycle?] 4. algorithm underlying the PRNG and its implementation should be as efficient as possible [ask students: why does it matter? note on true random number sources] (see also Hoos & Stuetzle, 2005 (SLS:FA), p.52-54; Tompkins & Hoos, 2006) 1+2 can be checked using statistical tests, e.g.: - monobit test (equal numbers of ones and zeros in the sequence), - poker test (a special instance of the chi-square test), - runs test (counts the frequency of runs of various lengths), - longruns test (checks whether there exists any run of length 34 or greater in 20 000 bits of the sequence) — autocorrelation test -> automated implementations exist, e.g., - NIST software (http://csrc.nist.gov/rng) - performs 16 groups of tests covering wide range of statistical properties NIST = American National Institute of Standards and Technology - newer TestU01 tests (L'Ecuyer & Simhard, 2007) see also: - Rukhin, A., Soto, J., Nechvatal, J., Smid, M., Barker, E., Leigh, S., Levenson, M., Vangel, M., Banks, D., Heckert, A., Dray, J., Vo, S.: A statistical test suite for random and pseudorandom number generators for cryptographic applications. Technical Report 800-22, NIST (2000) - P. L'Ecuyer and R. Simard, ``TestU01: A C Library for Empirical Testing of Random Number Generators, ACM Transactions on Mathematical Software, 33, 4, Article 22, August 2007. -- How to generate (good) pseudo-random number sequences? (Not really topic of this course, but ...) 2 methods: - Linear congruential generators (LCGs) - oldest and best-known pseudorandom number generator algorithms - Mersenne twister (MT) - modern method, LCG: - I_n+1 = (I_n x a + b) mod m, starting from seed I_0 with parameters a,b,m Example of an LCG (based on the ANSI ‘C’ specification): I_n+1 = (I_n x 1103515245 + 12345) mod MAX_INT - fast, use minimal memory - quality of pseudo-random numbers extremely sensitive to settings of a,b,m - serial correlation between successive I_n -> should not be used for applications where high-quality randomness is critical Mersenne Twister: - fast generation of very high-quality pseudorandom numbers, based on generalised feedback shift register (with some bells and whistles) - designed specifically to rectify many of the flaws found in older algorithms - very long period length, chosen to be a Mersenne prime (hence the name) MT 19937 has a period of 2^19937 - 1 - high-quality random numbers (much better than LCGs) - portable, freely & readily available for many platforms / programming languages -> quickly becoming PRNG of choice; standard, e.g., in R, MATLAB, Python, Ruby NOTE: - MT often initialised with seeds obtained from LCG to achieve random initial state more quickly. - In its native form, MT is *not* suitable for cryptography, since based on observing a relatively small number of iterates (for MT19937: only 624) all future iterates can be predicted. (Work-arounds exist.) for details, see: - http://en.wikipedia.org/wiki/Mersenne_twister - Matsumoto, M., Nishimura, T.: Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Modeling & Comp. Simulation 8 (1998) 3–30 -- How to generate pseudo-random numbers following different distributions? Given: - uniform random number source U. - F: CDF of the target distribution (from which we wish to sample) Note: if we had a function S for sampling from F, then F(S), i.e., the distribution of F(s) for samples s produced by S, is uniform => invert F and apply to U to obtain S, i.e., for any sample u from U, s := F^-1(u) [draw illustration] (need to truncate infinite tails - e.g., of Gaussian - to finite values) Special case: Box-Muller algorithm (1958) - generates pair of independent, standard normally distributed random number seq Z1, Z2 from pair of independent uniformly distr random number seq U1, U2 Z1 = sqrt{-2 ln U1} * cos(2 pi U2), Z2 = sqrt{-2 ln U1} * sin(2 pi U2) Alternative: Ziggurat algorithm (Marsaglia & Tsang, 2000) - based on rejection sampling - faster / more general than Box-Muller see: - http://en.wikipedia.org/wiki/Ziggurat_algorithm - George Marsaglia; Wai Wan Tsang (2000). "The Ziggurat Method for Generating Random Variables". Journal of Statistical Software 5 (8). http://www.jstatsoft.org/v05/i08/paper. -- How important is the quality of random numbers? Depends on application. - gambling, cryptography: very important - monte carlo simulations (e.g., of physical systems) - can be very important - games: typically rather unimportant - randomised alg for NP-hard problems: surprisingly unimportant (see Tompkins & Hoos, 2006) Overall: - except, perhaps, for highly sensitive cryptography and gambling, no compelling reason reason for true random number sources - but: for almost all randomised algorithms, no reason to use low-quality or dubious PRNGs - advice: use (readily available) std implementation of Mersenne Twister (MT) See also: - On the Quality and Quantity of Random Decisions in Stochastic Local Search for SAT. Dave A.D. Tompkins and Holger H. Hoos - Proceedings of the 19th Conference of the Canadian Society for Computational Studies of Intelligence, (AI-2006), Volume 4013 of Lecture Notes in Artificial Intelligence, pp. 146-158, 2006. -> http://www.cs.ubc.ca/~hoos/Publ/TomHoo06.pdf --- learning goals: - be able to name and explain criteria contributing to the quality of an algorithm or program and trade-offs between them - be able to explain differences and give examples for generic vs algorithm-specific improvements to an algorithm's behaviour - be able to explain what the field meta-algorithmics is concerned with and how it relates to the design of algorithms - be able to explain how Occam's razor relates to algorithm design - be able to explain the core ideas behind fuzz testing and to explain examples for simple applications - be able to explain the core ideas behind delta debugging and to explain examples for simple applications - be able to contrast pseudo-random number generators and true random number sources - be able to name at least one standard method for generating high-quality, uniformly distributed pseudo-random numbers - be able to explain how to generate pseudo-random numbers following different distributions - be able to explain criteria and methods used for assessing good pseudo-random number sequences