CPSC 536H: Empirical Algorithmics (Spring 2008)
Notes by Holger H. Hoos, University of British Columbia
---------------------------------------------------------------------------------------
Module 4: Algorithms with error for decision problems
---------------------------------------------------------------------------------------
4.1 Introduction
For many decision problems there are no algorithms that always
find a correct solution for a given problem instance
(or always find it efficiently enough).
-> decision algorithms with error
Examples:
- local search algorithms for hard combinatorial problems like SAT
- algorithms for solving classification problems in machine learning
- heuristic construction algorithms for shortest paths, spanning trees, ...
Types of errors:
- false negatives (FN): incorrectly return "no" answer
- false positives (FP): incorrectly return "yes" answer
(note: relation with type 1/2 errors in stat. testing)
Algorithms with:
- one-sided error: only one type of error may occur (often: false negative)
- two-sided error: both types of error may occur
Closely related to algorithms with one-sided error:
non-termination / 'no solution found' (censored data)
---
4.2 Deterministic decision algorithms with error
Errors occur deterministically on certain instances.
Performance of algorithm on single instance is now characterised by
- type of error made
- run-time
Note: In some cases, errors can be easily detected, in others not.
(This has practical implications on effort required for empirical analysis.)
Performance of algorithm on set of instances S is characterised by
[ask students]
- relative error (for one or both types of errors) = FN/|S|, FP/|S|
- SCDs for correct results, false neg results, false pos results
It is often very interesting to investigate how instance properties
correlate with error rate(s).
[ask students: what about parameterised alg?]
For parameterised algorithms, the correlation of parameter values
with error rate(s) is also of substantial interest.
Often, there are one or more parameters whose setting(s) control
trade-off between
- run-time and error rate(s)
- two types of errors (for algorithms with two-sided errors)
[ask students: examples?]
studying correlation between run-time / error rate(s)
as parameter(s) change can be insightful
(and useful for calibrating algorithm)
instead of error probabilities (= false pos / false neg rates), often consider
- sensitivity SE = TP/(TP+FN) = fraction of "yes" instances correctly solved (aka true positive rate, TPR)
Note: TP+FN = total number of "yes" instances
- specificity SP = TN/(TN+FP) = fraction of "no" instances correctly solved (aka true neg rate, TNR)
Note: TN+FP = total number of "no" instances
also:
- false pos rate FPR = 1-SP, false neg rate FNR = 1-SE
- pos predictive value PPV = TP/(TP+FP) = fraction of "yes" answers that are correct
Note: TP+FP = total number of "yes" answers
- neg predictive value PNV = TN/(FN+TN) = fraction of "no" answers that are correct
Note: FN+TN = total number of "no" answers
- F-measure = 2*PPV*SE/(PPV+SE) = harmonic mean of PPV, SE
Note: PPV and sensitivity are also known as 'recall' and 'precision' in information retrieval
receiver operating characteristic (ROC) curves:
graphical plot of sensitivity vs. (1 - specificity) as some algorithm (or process) parameter is varied
(typically used for binary classifiers as classification threshold is varied)
[see http://en.wikipedia.org/wiki/Receiver_operating_characteristic,
http://www.cs.ucl.ac.uk/staff/W.Langdon/roc/,
http://www.anaesthetist.com/mnm/stats/roc/ - nice interactive demonstrations!]
[slides]
Note: there are quantitative measures to summarise ROC curves, but these always lose information
about the underlying trade-off between the error types
--
mini case study: rna secondary structure prediction data from mirela andronescu (UBC/CS)
(exemplifies and slightly generalises methods / concepts from 4.2)
rna secondary structure prediction produces a set of predicted base pairs for
each RNA molecule (= problem instance);
RNA molecules are modelled as strings of bases,
in the correct secondary structure (there is only one per molecule), each base can be
- unpaired
- paired with exactly one other base (at different position)
-> base pairing can be captured as symmetric binary matrix (p_ij),
where p_ij=1 is base at pos i is paired with base at pos j, 0 otherwise
<= one pairing partners per base -> <= 1 ones per row (and column)
in a predicted structure, each base can be
- correctly unpaired (i.e., predicted unpaired & unpaired in correct structure)
- correctly paired (i.e., predicted paired with the same partner as paired with in correct structure)
- incorrectly paired (i.e., predicted paired while unpaired in correct structure
or predicated paired with partner different from that in correct structure)
= false positive
- incorrectly unpaired (i.e., predicted unpaired while paired in correct structure)
= false negative
Note:
- this can be seen as making a decision for each base, i.e., set of decisions
per problem instances - but these are not independent,
since each base can pair with only one other base
How should this type of algorithm be evaluated empirically?
[ask students]
- measure accuracy, various types of errors per molecule
- study distribution of these measures over benchmark set of molecules
-> correlation plots showing combinations of (intra-molecular) sensitivity vs. specificity
or equiv measures
[show analysis of data in gnuplot]
- correlation with properties of molecules? (e.g., length = # bases?)
[show analysis of data in gnuplot]
[ask students: what further analysis do the observed results suggest?]
---
4.3 Randomised algorithms with one-sided error
Monte-Carlo Algorithms (MCAs)
- decision algorithm whose run-time is a random variable
- may produce false negative and/or false positive results
Note: Multiple runs of algorithm on the same problem instance may produce correct results or error.
Here: MCA with one-sided error (typically false neg)
Similarly: Generalised Las Vegas algorithms
= LVA that may not always terminate (with a result)
Behaviour on single instance (using given parameter values) is characterised by
- error probability (or success probability = 1-error prob)
- RTD for runs with correct results
- RTD for runs with incorrect results
Reduction of error probability (= amplification of success prob) by
multiple independent runs:
Key Insight:
By performing multiple independent runs of randomised MCA with one-sided error,
error probability can be traded off against run-time.
Can this be exploited in practice?
[ask students]
[two cases:
- errors can be detected efficiently -> perform runs until success
- errors cannot be detected efficiently -> multiple runs + voting scheme
(depends on error rate)]
Multiple independent runs can be executed sequentially or in parallel.
Analysis on sets of instances:
- analyse run-time and errors separately on individual instances
and across set (for run-time, see Module 3; for error, analogously)
- correlation between run-time stats and error rate (over multiple runs)
across test set can be interesting [ask students: why?]
Investigation of trade-off between error rate and run-time:
see 4.2; keep in mind that here, occurrence of errors can vary
between runs on single instance and across instances
---
4.4 Randomised algorithms with two-sided error
MCA with two-sided error: both types of error may occur
Note: on each given instance, only one type of error can occur
[ask students: why?]
-> for single instances: same approach as in 4.3
Same approach as for MCA with one-sided error, but
- measure / study both types of error separately
- investigate correlation / trade-offs between the two types of errors
(see also 4.2)
---
learning goals:
- be able to explain the two types of errors that may occur in a decision algorithm with error
- be able to explain the concept and definition of a Monte-Carlo algorithm
- be able to explain and apply different measures capturing the occurrence of erroneous results
- be able to analyse trade-offs between error probability and run-time
- be able to analyse trade-offs between the two types of errors
- be able to explain and apply the concept of ROC curves
- be able to explain under which conditions and how and multiple independent runs can be
used to reduce the error probability of a Monte-Carlo algorithms