Empirical Algorithmics (Spring 2006)
ICT International Doctorate School, Universit&agrave; degli Studi di Trento
Notes by Holger H. Hoos, University of British Columbia

----------------------------------------------------
Module 1: Introduction [1.5 classes]
----------------------------------------------------

1.1 Some motivating examples

Example 1:

You have just developed a new algorithm A that, given historical weather data 
predicts whether it will rain tomorrow.
You believe A is better than any existing algorithms for this problem.

Question:
How do you show the superiority of your new algorithm?


Example 2:

You have implemented several heuristic algorithms for solving as efficiently as possible
an airline crew scheduling problem. 
You observe that which algorithm performs best appears to vary considerably between 
 g different problem instances.

Question:
How do you determine which algorithm performs best for which type of problem instance 
and why?


Example 3:

You have implemented a sophisticated algorithm for recognising various types of cancer 
based on biomedical diagnostics for a cell sample.
This algorithm is trained on expert-labelled data, which is difficult to obtain and you
want to achieve good performance with a minimum amount of training data.

Question:
How do you determine at which point further training is no longer necessary or desirable?


Note:
- In all of these cases, the answers to the respective questions will very likely have to be 
  determined using empirical methods.
- In particular, need to resort to computational experiments + statistical analysis techniques.


Exercise (in groups): Think about questions from your area of interest/expertise that may require 
	empirical studies.


---
1.2 CS as an empirical science

The Three Pillars of CS:

- Theory: deals with abstract models and their properties 
	(“eternal thruths”)

- Engineering: deals with principled design of artifacts 
	(hardware, systems, algorithms, interfaces)

- (Empirical) Science: deals with principled study of phenomenae 
	(behaviour of hardware, systems, algorithms; interactions)

Note:
- CS has strong roots in Math (-> theory) and Engineering (hardware design + implementation).
- Properties of artifacts in computing are usually studied by means of theoretical analysis 
  and/or (more or less) systematic testing.
- Many hardware and software artifacts ar so complex that they cannot be analysed with theoretical means
  or systematic testing.


What is science?

Definition of ”science” (according to the Merriam-Webster Unabridged Dictionary):

“3a: knowledge or a system of knowledge covering general truths
or the operation of general laws especially as obtained and tested
through scientific method”


The Scientific Method:

make observations

formulate hypothesis/hypotheses (model/theory)

While not satisfied (and deadline not exceeded) iterate:
1. design experiment to challenge model
2. conduct experiment
3. analyse experimental results
4. revise model based on results

Note:
- Hypotheses are often obtained through bold (and often incorrect) generalisation.
- Formulation and revision of hypotheses is a creative task, as is (to some extent) design of experiments.
- Experiments must be capable of producing outputs that invalidate the model.


Exercise (in groups): How does the scientific method apply to computing?

Results:
- study of the behaviour of complex algorithms 
- study of properties / behaviour of complex software / hardware components / systems
- study of interactions between systems and their users
- study of hardness of certain computational problems
- development and study of engineering principles (algorithms, software, hardware)
- others??


Exploratory vs. confirmatory analysis:

Most empirical studies have two components:
- exploratory analyses: collect observations, look at data, try to see trends, regularities, patterns
	results: new hypotheses, model modifications, ideas for experiments
	often rather informal, guided by intuition and (some) experience
- confirmatory analyses: carefully design and conduct experiments to answer very specific, technical questions
	results: quantitative answers to questions
	typically very technical, using established statistical methods (e.g., hypotheses tests)

Note:
- In many cases, researchers repeatedly switch between these stages.
- Tools and techniques exist for both types of analyses, but especially exploratory analyses typically 
	require creativity, experience, good judgement.


---
1.3 Empirical algorithmics

Goals of empirical algorithmics:

- Show that given algorithm A has property P.
- Show that given algorithm A is better than some other algorithm B.
- Show that given algorithm A improves state of the art (in solving a given problem).

[Examples - ask students]

But also:
- Design better algorithms (using empirical methods)
- Improve existing algorithms (using empirical techniques, e.g., for parameter optimisation)


Issues and problems arising in empirical algorithmics:

[Ask students]

- algorithm implementation (correctness, fairness)
- parameter settings (fairness in tuning)
- selection of problem instances (benchmarks)
- performance criteria (what is measured?)
- experimental protocol
- data analysis & interpretation
...


How is empirical algorithmics different from other empirical sciences?

[Ask students]

1. access to lowest, ultimate level of reality = precise and complete mathematical description of object under study
2. complete and precise control over the object of study and experimental environment 
3. experiments are often (but not always) relatively cheap
4. discrete (vs. continuous) behaviour with observable consequences (discretisation effects)

Consequences:
  2 -> perfect reproducibility of experiments (no uncertainty, no noise);
	typically no problems due to unknown/uncontrollable factors;
	instrumentation is typically relatively easy and very flexible
  3 -> statistical significance of results is often easier to achieve (by means of large sample sizes)

Note:
- large amounts of data often easy to generate - but this is not always beneficial, can cause problems


2 Types of theory:

1. From first principles: 
Unlike most other systems, algorithms can be studied purely based on their precise and complete mathematical description;
in practice, they are often too complex for this to succeed.
(In particular, high-performance heuristic algorithms for hard computational problems.)

2. From empirical observation (as in other empirical sciences)


The theory/practice gap:

For many problems, there is a considerable gap between 
- the best provable performance guarantees obtained for any known algorithm and
- the practically observed performance for any known algorithms

Note: 
- the algorithms mentioned above are often very different [why?]

Empirically derived theory can help to explore that gap and to close it by generating new insights


Theoretical vs. empirical analysis:
- theoretical results can inform empirical studies: suggest properties, effects
- empirical studies can guide computational theory: generate some hypotheses that can then be proven theoretically

Note:
- Computational theory and theoretical analysis techniques are important - know them and use them!


---
1.4 Course overview:

module 1: introduction 
module 2: deterministic algorithms for decision problems 
module 3: randomised algorithms for decision problems 
module 4: algorithms with error for decision problems
module 5: algorithms for optimisation problems 
module 6: advanced topics 

[
Initial draft:
module 1: introduction 
module 2: deterministic algorithms for decision problems 
module 3: randomised algorithms without error (las vegas algorithms)  for decision problems 
module 4: randomised algorithms with one-sided error (monte carlo algorithms) for decision problems 
module 5: randomised algorithms with two-sided error (monte carlo algorithms) for decision problems 
module 6: algorithms for optimisation problems 
module 7: advanced topics 
]

Advanced topics may include the following (depending on time + student interest):
- experimental design
- algorithm portfolios
- self-tuning mechanisms, meta-parameters
- multi-objective optimisation algorithms
- real-time algorithms
- interaction with a non-deterministic environment (humans, internet, ...)

Schedule:
  mon, 29 May - 9.00-11.00, room 205, III Irst	[module 1]
  tue, 30 May - 9.00-11.00, room 207, III Irst	[module 2]
  wed, 31 May – 13.30-15.30, room 107, III Irst	[module 2,3]
  thu, 1 June - 8.30-10.30, room 201, III Irst 	[module 3]

  fri, 9 June - 9.00-13.00, room 108, III Irst	[module 4]

  mon, 12 June - 9.00-11.00, room 108, III Irst	[module 5]
  tue, 13 June - 9.00-11.00, room 106, III Irst	[module 6]
  thu, 15 June - 9.00-13.00, room 106, III Irst 	[module 6]

Student assessment:
- 2 assignments, probably to be released around 2 June / 9 June,
  consisting of literature study, knowledge testing questing, some programming / hands-on problems
  due 9 June / 15 June at the beginning of class; 
  marked ~12 June / ~20 June
  [~40%]
- take-home exam: to be released thu, 15 June; due 10 July; marked ~24 July
  [~40%]
- in-class participation (possibly including short presentation)
  [~20%]

Note:
- Brand-new course; parts will likely be a bit rough
- Tell me what you like / dislike
- Let me know when I'm too fast / too slow
- Tell me right away when you don't understand something
- Ask questions, contribute your comments and ideas

Questions? 
Comments?


---
learning goals (for module 1):
- be able to explain the three foundations of CS
- understand the scientific method and how it applies to computing science in general, 
	and to the analysis of algorithms in particular
- be able to explain similarities and differences between empirical algorithmics and other empirical sciences
- be able to explain goals of the empirical analysis of algorithms and types of empirical studies
- be able to explain issues/problems arising in the empirical analysis of algorithms
- also: understand how the course is structured and how students are evaluated

<eof>