CPSC 536H: Empirical Algorithmics (Spring 2012)
Notes by Holger H. Hoos, University of British Columbia

---------------------------------------------------------------------------------------
Module 6: General considerations for empirical design methods
---------------------------------------------------------------------------------------

6.1  What makes a good algorithm (for solving a given problem)?

Correctness: need to have correct behaviour
	-> *testing and debugging*
	what about algorithms with error?
	randomised algorithms?
	parallel algorithms?

Performance: avg/peak/robustness=consistency (outliers)
	have already discussed how to assess, but how do we achieve good performance?
	-> modules 7-10 (various method)

Reproducibility of behaviour:
  - dependence on execution environment
	(OS platform, machine, ...)
	- already covered in module 1
  - randomness: is it a problem?
    	*random number generators / seeds*
	will see benefits in module 8

Configurability / parameters:
	many = good (or perhaps bad?)

Ease of implementation, evolvability & reusability
	initial implementation vs modifications vs reuse
	-> software engineering topics, but gain special importance in the 
	context of automated design methods

Theoretical guarantees
	- covered (to some extent) in module 10


Note: 
- typically, no single best algorithm for given problem
  -> parameters and configuration (module 7)
     algorithm selection (module 9)
     portfolios (module 10)
- generic improvements (i.e., ones independent of the details
	of the problem or algorithm) often possible
  - this is what we focus on in the following
- we are interested in principled, automated design methods
	-> algorithms (fully formalised, automated procedures) that 
	create, transform, analyse algorithms (target algorithms)
	-> meta-algorithmics - the study of such meta-algorithms


---
6.2 Ease of implementation, evolvability & reusability

- how to measure? [ask students]
	person hours? description length? ...

- rapid prototyping, modularity, algorithm frameworks

- Occam's razor: conceptual simplicity vs performance

- automated algorithm design - code bloat, optimisation for simplicity
	quantified tradeoff between performance, complexity


6.3 Some notes on testing & debugging

Goal: ascertain/improve correctness of algorithm / implementation using empirical methods (as opposed to mathematical proofs of correctness)

To achieve this: iteratively detect & remedy incorrect behaviour (bugs)

Debugging 
= the process of identifying and correcting the root cause of a failure
[Zeller, 2000]


--
Fuzz testing (or fuzzing) [see http://en.wikipedia.org/wiki/Fuzz_testing, http://pages.cs.wisc.edu/~bart/fuzz/]:

key idea:
automatically find circumstances under which programs fail (bugs) 
by providing automatically constructed, random(ised) input data.

realisation:
software tool (fuzzer) iteratively constructs inputs
and runs given program, recording result (run failed or successful)


widely used in industry, 
most common targets: file formats and protocols, 
but applicable for all types of program input

enhances software safety & security


oldest and simplest form (Miller et al., 1990):
  input = random stream of bits 

more advanced:
-  (random) transformations (= mutations) of existing inputs
	(from test suites, real-life),
  e.g.: by flipping bits at random, moving/swapping chunks 
	(such as blocks of files)
- grammar-based generation or transformation of input
- white box fuzzing http://research.microsoft.com/en-us/projects/atg/ndss2008.pdf], evolutionary fuzzing [see http://www.vdalabs.com/tools/efs_gpf.html]

can focus on valid, invalid and mostly-valid inputs


The first paper on fuzz testing:

Miller, L. Fredriksen, and B. So, "An Empirical Study of the Reliability of UNIX Utilities", Communications of the ACM 33, 12 (December 1990).
(originated from a grad course project in 1988!)

Further information/literature: http://pages.cs.wisc.edu/~bart/fuzz/


--
Delta debugging: [see http://www.st.cs.uni-saarland.de/dd/]
[material here mostly based on Andreas Zeller: 
From Automated Testing to Automated Debugging (2000)]


note [Zeller, 2000]: 
"debugging is experimental science: noticing something, one wonders why it happens, and one sets up a number of hypotheses which one confirms or refutes by means of experiments."
-> scientific method!

goal: automate the scientific method of debugging

key idea: isolate failure causes (= failure-inducing circumstances)
automatically, by systematically narrowing down failure-inducing circumstances until minimal set remains 
(minimal: cannot be reduced any further)
-> delta debugging algorithm

applications: isolation of failure-inducing
- program input 
  (e.g., HTML pages that causes failure of web browser,
     CNF formulae for which a SAT solver gives an incorrect answer, ...)
- user interaction 
  (e.g., keystrokes that cause a program crash)
- changes to the program code 
  (e.g., after a failing regression test).


the delta-debugging (meta-)algorithm for simplifying failure-inducing
input:

given:
- program P to be executed (e.g., web browser)
- (observed) failure F (e.g., browser crashes)
  and corresponding input I
- test function T: determines whether P executed on some given input 
	(e.g., on given HTML page) produces F, 
	result: failure occurs (x), doesn't occur (ok), unresolved (?)

want:
- minimal set of of failure-inducing circumstances 
  (which provide insight into nature/cause of failure)
  (e.g., single HTML tag that causes the browser to crash)


how it works (key ideas):
- iteratively simplify input I such that failure F persists
- in each step:
    I' := I with some part removed
    if T(P,I) = x: I := I' (i.e., continue with reduced I)
- start removing large chunks, then gradually reduce size
    of chunks removed (depending on circumstances, down 
    to single characters) - e.g., geometric progression
    (1/2, 1/4, ...)


variations of this procedure can be used to narrow down 
failure-inducing differences between a faulty run R_x (or version of P)
and a working run R_ok (or version of P)
=> find faulty run and working run with *minimal difference*
   (makes it easy to detect / eliminate root cause)

[see Figure 3 from paper: X ---> |   | <--- ok]

three approaches
- modify R_x to be closer to R_ok 
- modify R_ok to be closer to R_x 
- do both (interleave)
=> dd algorithm, see paper


Literature:
- Andreas Zeller: From Automated Testing to Automated Debugging, http://www.infosun.fim.uni-passau.de/st/papers/computer2000/computer.pdf, 2000.
- Yesterday, my program worked. Today, it does not. Why? Andreas Zeller; Proc. ESEC/FSE 99, Toulouse, France, September 1999, Vol. 1687 of LNCS, pp. 253-267. 
- Andreas Zeller: WHY PROGRAMS FAIL: A Guide to Systematic Debugging. Morgan Kaufmann, 2nd edition, 2009.


Further information: http://www.st.cs.uni-saarland.de/dd/


---
6.4 Randomisation and pseudo-random number generation

As previously discussed, many algorithms use randomisation
=> increased performance robustness, 
  chance of good performance (at risk of bad performance) 
    can be exploited (restarts / parallelisation - module 8)

Random number sources are also important in cryptography, simulation of stochastic systems,
computer art (e.g., music, film), games, gambling etc.

[briefly discuss: randomised=probabilistic vs non-deterministic computation - compare, 
e.g., respective Turing machine models]


Standard computer hardware does not provide source of randomness
=> random numbers typically generated in software - pseudo-random number generators (PRNGs)

Note: there are actual random number sources, e.g., random.org (based on atmospheric noise),
  special devices (based on various physical phenomenae)

PNRG: 
procedure that, starting from a seed number generates a sequence of pseudo-random numbers,
i.e., sequence of numbers (typically integers) that resembles a stream of independent
samples from a uniform distribution

Advantage: Use of PRNGs implies deterministic computation (given seed)
  -> reproducibility! (but: need to fix/store seeds, beware of OS/compiler dependency of PNRG)

Disadvantage: Deviations from true randomness can cause abnormal/undesired behaviour


--
What makes a good random number sequence?

1. uniform and unbiased, 
  i.e., equal fractions of generated numbers should fall into equal intervals

2. serially uncorrelated, 
  i.e, n-tuples from the sequence should be independent of one another

3. long period; while ideally, the generator should not cycle, in practice, 
   repetition should occur only after a very large set of numbers 
	has been generated
[ask students: why would a PRNG cycle?]

4. algorithm underlying the PRNG and its implementation should be 
	as efficient as possible
[ask students: why does it matter? note on true random number sources]

(see also Hoos & Stuetzle, 2005 (SLS:FA), p.52-54; Tompkins & Hoos, 2006)


1+2 can be checked using statistical tests, e.g.:
- monobit test (equal numbers of ones and zeros in the sequence), 
- poker test (a special instance of the chi-square test), 
- runs test (counts the frequency of runs of various lengths), 
- longruns test (checks whether there exists any run of length 34 or greater 
in 20 000 bits of the sequence) 
— autocorrelation test

-> automated implementations exist, e.g., 
   - NIST software (http://csrc.nist.gov/rng)
	- performs 16 groups of tests covering wide range of statistical properties
	NIST = American National Institute of Standards and Technology 
   - newer TestU01 tests (L'Ecuyer & Simhard, 2007)

see also: 
- Rukhin, A., Soto, J., Nechvatal, J., Smid, M., Barker, E., Leigh, S., Levenson, M., Vangel, M.,
Banks, D., Heckert, A., Dray, J., Vo, S.: A statistical test suite for random and pseudorandom
number generators for cryptographic applications. Technical Report 800-22, NIST (2000)
- P. L'Ecuyer and R. Simard, ``TestU01: A C Library for Empirical Testing of Random Number Generators, ACM Transactions on Mathematical Software, 33, 4, Article 22, August 2007.

--
How to generate (good) pseudo-random number sequences? (Not really topic of this course, but ...)

2 methods: 
- Linear congruential generators (LCGs) - oldest and best-known pseudorandom number generator algorithms
- Mersenne twister (MT) - modern method, 


LCG:
- I_n+1 = (I_n x a + b) mod m, starting from seed I_0 with parameters a,b,m
  Example of an LCG (based on the ANSI ‘C’ specification): 
    I_n+1 = (I_n x 1103515245 + 12345) mod MAX_INT

- fast, use minimal memory
- quality of pseudo-random numbers extremely sensitive to settings of a,b,m
- serial correlation between successive I_n
 -> should not be used for applications where high-quality randomness is critical


Mersenne Twister:
- fast generation of very high-quality pseudorandom numbers,
	based on generalised feedback shift register (with some bells and whistles)
- designed specifically to rectify many of the flaws found in older algorithms
- very long period length, chosen to be a Mersenne prime (hence the name)
  MT 19937 has a period of 2^19937 - 1

- high-quality random numbers (much better than LCGs)
- portable, freely & readily available for many platforms / programming languages
  -> quickly becoming PRNG of choice; standard, e.g., in R, MATLAB, Python, Ruby

NOTE:
- MT often initialised with seeds obtained from LCG to achieve random initial state more quickly.
- In its native form, MT is *not* suitable for cryptography, since based on observing 
a relatively small number of iterates (for MT19937: only 624) all future iterates can be predicted.
(Work-arounds exist.)


for details, see:
- http://en.wikipedia.org/wiki/Mersenne_twister
- Matsumoto, M., Nishimura, T.: Mersenne twister: a 623-dimensionally equidistributed uniform
pseudo-random number generator. ACM Modeling & Comp. Simulation 8 (1998) 3–30


--
How to generate pseudo-random numbers following different distributions?

Given:
- uniform random number source U.
- F: CDF of the target distribution (from which we wish to sample)

Note: 
if we had a function S for sampling from F, then F(S), i.e.,
the distribution of F(s) for samples s produced by S, is uniform
=> invert F and apply to U to obtain S,
  i.e., for any sample u from U, s := F^-1(u)
[draw illustration]

(need to truncate infinite tails - e.g., of Gaussian - to finite values)


Special case: Box-Muller algorithm (1958)
- generates pair of independent, standard normally distributed random number seq 
  Z1, Z2 from pair of independent uniformly distr random number seq U1, U2

Z1 = sqrt{-2 ln U1} * cos(2 pi U2),
Z2 = sqrt{-2 ln U1} * sin(2 pi U2)


Alternative: Ziggurat algorithm (Marsaglia & Tsang, 2000)
- based on rejection sampling
- faster / more general than Box-Muller

see:
- http://en.wikipedia.org/wiki/Ziggurat_algorithm
- George Marsaglia; Wai Wan Tsang (2000). "The Ziggurat Method for Generating Random Variables". Journal of Statistical Software 5 (8). http://www.jstatsoft.org/v05/i08/paper. 


--
How important is the quality of random numbers? 

Depends on application.

- gambling, cryptography: very important
- monte carlo simulations (e.g., of physical systems) - can be very important
- games: typically rather unimportant
- randomised alg for NP-hard problems: surprisingly unimportant 
(see Tompkins & Hoos, 2006)

Overall:
- except, perhaps, for highly sensitive cryptography and gambling,
  no compelling reason reason for true random number sources
- but: for almost all randomised algorithms, no reason to use low-quality
  or dubious PRNGs
- advice: use (readily available) std implementation of Mersenne Twister (MT)


See also:
- On the Quality and Quantity of Random Decisions in Stochastic Local Search for SAT.
Dave A.D. Tompkins and Holger H. Hoos - Proceedings of the 19th Conference of the Canadian Society for Computational Studies of Intelligence, (AI-2006), Volume 4013 of Lecture Notes in Artificial Intelligence, pp. 146-158, 2006.
-> http://www.cs.ubc.ca/~hoos/Publ/TomHoo06.pdf



---
learning goals:
- be able to name and explain criteria contributing to the quality of an algorithm or program and trade-offs between them
- be able to explain differences and give examples for generic vs algorithm-specific improvements to an algorithm's behaviour
- be able to explain what the field meta-algorithmics is concerned with and how it relates to the design of algorithms
- be able to explain how Occam's razor relates to algorithm design 
- be able to explain the core ideas behind fuzz testing and to explain examples for simple applications
- be able to explain the core ideas behind delta debugging and to explain examples for simple applications
- be able to contrast pseudo-random number generators and true random number sources
- be able to name at least one standard method for generating high-quality, uniformly distributed
pseudo-random numbers
- be able to explain how to generate pseudo-random numbers following different distributions
- be able to explain criteria and methods used for assessing good pseudo-random number sequences
<eof>