Lecture 9: Intro to Phylogenetic Tree Reconstruction

 

  1. Basics
  2.  

  3. Phylogenetic Trees
  4.  

  5. Phylogenetic Tree Reconstruction (Inference) Problem
  6. Given:

    Want: fully labelled phylogenetic tree that 'best' explains the given data (i.e. maximize a target function (score) )

    Assumptions:

    Simple Solution: check them all out and pick the best one

     

  7. Distance-Based Algorithms
  8. Idea:  
    Which distance metric?  
    How to find the tree
    1. general idea: given pairwise distance dij and tree T predicting pairwise distance dij', look at:

      find the T that minimizes SSQ(T) => Least Squares Method
      but NP-complete
    2. Clustering: UPGMA (Unweighted Pair Group Method Using Arithmetic Averages)
      • Idea: cluster sequences; at each stage, merge two groups and create a new node in the tree
      • build the tree bottom up from the leaves
      • distances dij of clusters Ci, Cj = average distance between pairs of sequences from each cluster

      • complexity: polynomial
      • result: rooted tree with molecular clock property (MCP)
        • 1:1 correspondence between distance and evolutionary time
        • not always true in reality; some sequences evolve faster
      • if 'true' tree doesn't have MCP, UPGMA will give incorrect results
      Question: Can we find optimal trees efficiently after relaxing MCP?
      => Yes, use neighbour joining
    3. Neighbour Joining
      • guarantees to generate correct tree in polynomial time if distance is additive
        (weaker than MCP, so more reasonable; still, not always true)
      • Idea:
        • find a pair of neighbouring leafs (in 'true' tree)
        • remove then a set of leaves
        • define the distance between the pair k and other other leaf m by
          dkm = 1/2 (dim + djm - dij)
        • add k as a leaf
        • iterate until done (i.e. only 2 leaves are left)
      • can't just use the minimum distance between pair
      • need to use correction factor
        Dij = dij - (vi + ij)
        (refer to Durbin et al., Chapter 7)
        =>minimum Dij is guaranteed to represent neighbouring leaves in the true tree
      • Result: correct unrooted tree in polynomial time