CPSC 536H: Empirical Algorithmics (Spring 2008) Notes by Holger H. Hoos, University of British Columbia --------------------------------------------------------------------------------------- Module 4: Algorithms with error for decision problems --------------------------------------------------------------------------------------- 4.1 Introduction For many decision problems there are no algorithms that always find a correct solution for a given problem instance (or always find it efficiently enough). -> decision algorithms with error Examples: - local search algorithms for hard combinatorial problems like SAT - algorithms for solving classification problems in machine learning - heuristic construction algorithms for shortest paths, spanning trees, ... Types of errors: - false negatives (FN): incorrectly return "no" answer - false positives (FP): incorrectly return "yes" answer (note: relation with type 1/2 errors in stat. testing) Algorithms with: - one-sided error: only one type of error may occur (often: false negative) - two-sided error: both types of error may occur Closely related to algorithms with one-sided error: non-termination / 'no solution found' (censored data) --- 4.2 Deterministic decision algorithms with error Errors occur deterministically on certain instances. Performance of algorithm on single instance is now characterised by - type of error made - run-time Note: In some cases, errors can be easily detected, in others not. (This has practical implications on effort required for empirical analysis.) Performance of algorithm on set of instances S is characterised by [ask students] - relative error (for one or both types of errors) = FN/|S|, FP/|S| - SCDs for correct results, false neg results, false pos results It is often very interesting to investigate how instance properties correlate with error rate(s). [ask students: what about parameterised alg?] For parameterised algorithms, the correlation of parameter values with error rate(s) is also of substantial interest. Often, there are one or more parameters whose setting(s) control trade-off between - run-time and error rate(s) - two types of errors (for algorithms with two-sided errors) [ask students: examples?] studying correlation between run-time / error rate(s) as parameter(s) change can be insightful (and useful for calibrating algorithm) instead of error probabilities (= false pos / false neg rates), often consider - sensitivity SE = TP/(TP+FN) = fraction of "yes" instances correctly solved (aka true positive rate, TPR) Note: TP+FN = total number of "yes" instances - specificity SP = TN/(TN+FP) = fraction of "no" instances correctly solved (aka true neg rate, TNR) Note: TN+FP = total number of "no" instances also: - false pos rate FPR = 1-SP, false neg rate FNR = 1-SE - pos predictive value PPV = TP/(TP+FP) = fraction of "yes" answers that are correct Note: TP+FP = total number of "yes" answers - neg predictive value PNV = TN/(FN+TN) = fraction of "no" answers that are correct Note: FN+TN = total number of "no" answers - F-measure = 2*PPV*SE/(PPV+SE) = harmonic mean of PPV, SE Note: PPV and sensitivity are also known as 'recall' and 'precision' in information retrieval receiver operating characteristic (ROC) curves: graphical plot of sensitivity vs. (1 - specificity) as some algorithm (or process) parameter is varied (typically used for binary classifiers as classification threshold is varied) [see http://en.wikipedia.org/wiki/Receiver_operating_characteristic, http://www.cs.ucl.ac.uk/staff/W.Langdon/roc/, http://www.anaesthetist.com/mnm/stats/roc/ - nice interactive demonstrations!] [slides] Note: there are quantitative measures to summarise ROC curves, but these always lose information about the underlying trade-off between the error types -- mini case study: rna secondary structure prediction data from mirela andronescu (UBC/CS) (exemplifies and slightly generalises methods / concepts from 4.2) rna secondary structure prediction produces a set of predicted base pairs for each RNA molecule (= problem instance); RNA molecules are modelled as strings of bases, in the correct secondary structure (there is only one per molecule), each base can be - unpaired - paired with exactly one other base (at different position) -> base pairing can be captured as symmetric binary matrix (p_ij), where p_ij=1 is base at pos i is paired with base at pos j, 0 otherwise <= one pairing partners per base -> <= 1 ones per row (and column) in a predicted structure, each base can be - correctly unpaired (i.e., predicted unpaired & unpaired in correct structure) - correctly paired (i.e., predicted paired with the same partner as paired with in correct structure) - incorrectly paired (i.e., predicted paired while unpaired in correct structure or predicated paired with partner different from that in correct structure) = false positive - incorrectly unpaired (i.e., predicted unpaired while paired in correct structure) = false negative Note: - this can be seen as making a decision for each base, i.e., set of decisions per problem instances - but these are not independent, since each base can pair with only one other base How should this type of algorithm be evaluated empirically? [ask students] - measure accuracy, various types of errors per molecule - study distribution of these measures over benchmark set of molecules -> correlation plots showing combinations of (intra-molecular) sensitivity vs. specificity or equiv measures [show analysis of data in gnuplot] - correlation with properties of molecules? (e.g., length = # bases?) [show analysis of data in gnuplot] [ask students: what further analysis do the observed results suggest?] --- 4.3 Randomised algorithms with one-sided error Monte-Carlo Algorithms (MCAs) - decision algorithm whose run-time is a random variable - may produce false negative and/or false positive results Note: Multiple runs of algorithm on the same problem instance may produce correct results or error. Here: MCA with one-sided error (typically false neg) Similarly: Generalised Las Vegas algorithms = LVA that may not always terminate (with a result) Behaviour on single instance (using given parameter values) is characterised by - error probability (or success probability = 1-error prob) - RTD for runs with correct results - RTD for runs with incorrect results Reduction of error probability (= amplification of success prob) by multiple independent runs: Key Insight: By performing multiple independent runs of randomised MCA with one-sided error, error probability can be traded off against run-time. Can this be exploited in practice? [ask students] [two cases: - errors can be detected efficiently -> perform runs until success - errors cannot be detected efficiently -> multiple runs + voting scheme (depends on error rate)] Multiple independent runs can be executed sequentially or in parallel. Analysis on sets of instances: - analyse run-time and errors separately on individual instances and across set (for run-time, see Module 3; for error, analogously) - correlation between run-time stats and error rate (over multiple runs) across test set can be interesting [ask students: why?] Investigation of trade-off between error rate and run-time: see 4.2; keep in mind that here, occurrence of errors can vary between runs on single instance and across instances --- 4.4 Randomised algorithms with two-sided error MCA with two-sided error: both types of error may occur Note: on each given instance, only one type of error can occur [ask students: why?] -> for single instances: same approach as in 4.3 Same approach as for MCA with one-sided error, but - measure / study both types of error separately - investigate correlation / trade-offs between the two types of errors (see also 4.2) --- learning goals: - be able to explain the two types of errors that may occur in a decision algorithm with error - be able to explain the concept and definition of a Monte-Carlo algorithm - be able to explain and apply different measures capturing the occurrence of erroneous results - be able to analyse trade-offs between error probability and run-time - be able to analyse trade-offs between the two types of errors - be able to explain and apply the concept of ROC curves - be able to explain under which conditions and how and multiple independent runs can be used to reduce the error probability of a Monte-Carlo algorithms