CPSC 536H: Empirical Algorithmics (Spring 2008)
Notes by Holger H. Hoos, University of British Columbia

---------------------------------------------------------------------------------------
Module 4: Algorithms with error for decision problems
---------------------------------------------------------------------------------------

4.1 Introduction

For many decision problems there are no algorithms that always 
find a correct solution for a given problem instance
(or always find it efficiently enough).
-> decision algorithms with error

Examples: 
- local search algorithms for hard combinatorial problems like SAT
- algorithms for solving classification problems in machine learning
- heuristic construction algorithms for shortest paths, spanning trees, ...

Types of errors:
- false negatives (FN): incorrectly return "no" answer
- false positives (FP): incorrectly return "yes" answer
(note: relation with type 1/2 errors in stat. testing)

Algorithms with:
- one-sided error: only one type of error may occur (often: false negative)
- two-sided error: both types of error may occur

Closely related to algorithms with one-sided error:
non-termination / 'no solution found' (censored data)

---
4.2 Deterministic decision algorithms with error 

Errors occur deterministically on certain instances.

Performance of algorithm on single instance is now characterised by 
- type of error made
- run-time

Note: In some cases, errors can be easily detected, in others not.
(This has practical implications on effort required for empirical analysis.)

Performance of algorithm on set of instances S is characterised by 
[ask students]
- relative error (for one or both types of errors) = FN/|S|, FP/|S|
- SCDs for correct results, false neg results, false pos results

It is often very interesting to investigate how instance properties
correlate with error rate(s).

[ask students: what about parameterised alg?]
For parameterised algorithms, the correlation of parameter values
with error rate(s) is also of substantial interest.

Often, there are one or more parameters whose setting(s) control 
trade-off between
- run-time and error rate(s)
- two types of errors (for algorithms with two-sided errors)

[ask students: examples?]

studying correlation between run-time / error rate(s)
as parameter(s) change can be insightful
(and useful for calibrating algorithm)

instead of error probabilities (= false pos / false neg rates), often consider  
- sensitivity SE = TP/(TP+FN) = fraction of "yes" instances correctly solved (aka true positive rate, TPR)
	Note: TP+FN = total number of "yes" instances
- specificity SP = TN/(TN+FP) = fraction of "no" instances correctly solved (aka true neg rate, TNR)
	Note: TN+FP = total number of "no" instances
also:
- false pos rate FPR = 1-SP, false neg rate FNR = 1-SE
- pos predictive value PPV = TP/(TP+FP) = fraction of "yes" answers that are correct
	Note: TP+FP = total number of "yes" answers
- neg predictive value PNV = TN/(FN+TN) = fraction of "no" answers that are correct
	Note: FN+TN = total number of "no" answers
- F-measure = 2*PPV*SE/(PPV+SE)	= harmonic mean of PPV, SE

Note: PPV and sensitivity are also known as 'recall' and 'precision'  in information retrieval

receiver operating characteristic (ROC) curves:
  graphical plot of sensitivity vs. (1 - specificity) as some algorithm (or process) parameter is varied
  (typically used for binary classifiers as classification threshold is varied)
[see http://en.wikipedia.org/wiki/Receiver_operating_characteristic,
  http://www.cs.ucl.ac.uk/staff/W.Langdon/roc/,
  http://www.anaesthetist.com/mnm/stats/roc/ - nice interactive demonstrations!]

[slides]

Note: there are quantitative measures to summarise ROC curves, but these always lose information
	about the underlying trade-off between the error types 

--
mini case study: rna secondary structure prediction data from mirela andronescu (UBC/CS)
(exemplifies and slightly generalises methods / concepts from 4.2)

rna secondary structure prediction produces a set of predicted base pairs for
each RNA molecule (= problem instance);

RNA molecules are modelled as strings of bases,
in the correct secondary structure (there is only one per molecule), each base can be 
- unpaired
- paired with exactly one other base (at different position)
-> base pairing can be captured as symmetric binary matrix (p_ij),
	where p_ij=1 is base at pos i is paired with base at pos j, 0 otherwise
  <= one pairing partners per base -> <= 1 ones per row (and column)

in a predicted structure, each base can be
- correctly unpaired (i.e., predicted unpaired & unpaired in correct structure)
- correctly paired (i.e., predicted paired with the same partner as paired with in correct structure)
- incorrectly paired (i.e., predicted paired while unpaired in correct structure
			or predicated paired with partner different from that in correct structure)
  = false positive
- incorrectly unpaired (i.e., predicted unpaired while paired in correct structure)
  = false negative

Note: 
- this can be seen as making a decision for each base, i.e., set of decisions
  per problem instances - but these are not independent, 
  since each base can pair with only one other base

How should this type of algorithm be evaluated empirically?
[ask students]

- measure accuracy, various types of errors per molecule
- study distribution of these measures over benchmark set of molecules
  -> correlation plots showing combinations of (intra-molecular) sensitivity vs. specificity
	or equiv measures
[show analysis of data in gnuplot]
- correlation with properties of molecules? (e.g., length = # bases?)
[show analysis of data in gnuplot]

[ask students: what further analysis do the observed results suggest?]


---
4.3 Randomised algorithms with one-sided error

Monte-Carlo Algorithms (MCAs)
- decision algorithm whose run-time is a random variable
- may produce false negative and/or false positive results

Note: Multiple runs of algorithm on the same problem instance may produce correct results or error.

Here: MCA with one-sided error (typically false neg)

Similarly: Generalised Las Vegas algorithms 
	= LVA that may not always terminate (with a result)

Behaviour on single instance (using given parameter values) is characterised by 
- error probability (or success probability = 1-error prob)
- RTD for runs with correct results
- RTD for runs with incorrect results


Reduction of error probability (= amplification of success prob) by 
multiple independent runs:

Key Insight:
By performing multiple independent runs of randomised MCA with one-sided error, 
	error probability can be traded off against run-time.

Can this be exploited in practice?
[ask students]

[two cases:
- errors can be detected efficiently -> perform runs until success 
- errors cannot be detected efficiently -> multiple runs + voting scheme 
  (depends on error rate)]

Multiple independent runs can be executed sequentially or in parallel.


Analysis on sets of instances:
- analyse run-time and errors separately on individual instances 
  and across set (for run-time, see Module 3; for error, analogously)
- correlation between run-time stats and error rate (over multiple runs) 
  across test set can be interesting [ask students: why?]

Investigation of trade-off between error rate and run-time:
see 4.2; keep in mind that here, occurrence of errors can vary 
between runs on single instance and across instances

---
4.4 Randomised algorithms with two-sided error

MCA with two-sided error: both types of error may occur

Note: on each given instance, only one type of error can occur 
  [ask students: why?]
  -> for single instances: same approach as in 4.3

Same approach as for MCA with one-sided error, but 
- measure / study both types of error separately
- investigate correlation / trade-offs between the two types of errors
(see also 4.2)

---
learning goals:
- be able to explain the two types of errors that may occur in a decision algorithm with error
- be able to explain the concept and definition of a Monte-Carlo algorithm
- be able to explain and apply different measures capturing the occurrence of erroneous results
- be able to analyse trade-offs between error probability and run-time
- be able to analyse trade-offs between the two types of errors 
- be able to explain and apply the concept of ROC curves
- be able to explain under which conditions and how and multiple independent runs can be 
	used to reduce the error probability of a Monte-Carlo algorithms
<eof>