Computational Evolutionary Biology

BSC 5936-6 (CRN:  11476)

Advanced computational methods are becoming increasingly important in biology. A wide range of applications --- including, for instance, identifying pathogens, tracing viral transmission pathways, and reconstructing the geographic expansion of humans out of Africa --- rely on evolutionary inference. This course will cover the methods currently used for evolutionary inference, the stochastic models and inference principles they are based on, and how they are implemented in practice. In a separate lab session, the students will get hands-on experience in developing computational software implementing these methods. We expect that the students leave the course with the necessary skills to develop their own ideas and methods.

Lecture:            MW 9:05-11:00 a.m.            152 DSL
Lab:                   M 11:00-1:15 p.m.             152 DSL

Prerequisite: Introduction to bioinformatics or equivalent, Programming for biologists or equivalent (some background in programming), some statistics background desirable

Syllabus: download PDF

Date Content Material
08/29
Introduction
Phylogenetic trees: definitions of parts, alternative tree data structures
Lecture notes
Software assignment
08/31 Parsimony. Counting evolutionary change
  • Fitch parsimony
  • Wagner parsimony
  • Sankoff cost matrices
Lecture notes
09/07 Searching for the best tree(s)
  • How many trees are there
  • Explicit enumeration (exhaustive search)
  • Branch and bound method
  • Heuristic methods
Lecture notes
09/12
Stochastic models of evolution I
    • Discrete model to continous model
    • Nucleotide models: JC, F81, HKY, GTR
    • Rate variation among sites: Gamma distribution
Lecture notes

Software assignment 2

09/14
Stochastic models of evolution II
    • Protein models
    • Codon models
    • Stem-Loop RNA model
Lecture notes
09/19 Stochastic models of evolution III
    • Infinite alllele mutation model, infinite sites mutation model
    • Microsatellite mutation models: stepwise mutation model and Brownian motion model
    • Dominant markers: RFLP, AFLP
    • Substitution rate versus mutation rate
Lecture notes

Software assignment 3

09/21 Stochastic models of evolution IV
    • Quantitative characters
    • Morphology
Lecture notes
09/26 Hidden Markov models
    • General introduction to HMMs
    • Rate variation across sites: the autocorrelated gamma
    • Mixture models
Lecture notes

Software assignment 4

09/28 Maximum likelihood inference I
    • General principles
    • Coin-tossing
    • Statistical consistency and efficiency
    • Conditional likelihoods
Lecture notes
10/03
Maximum likelihood inference II
    • Conditional likelihoods
    • Optimizing branch length
Lecture notes

Software assignment 5

10/05 Bayesian inference
    • General principle
    • Coin tossing
    • Statistical consistency and efficiency
    • Priors
Lecture notes
10/10
Markov chain Monte Carlo I
    • Gibbs and Metropolis samplers
    • Metropolis coupling
Lecture notes

Software assignment 6

10/12
Markov chain Monte Carlo II
    • Proposal distributions
    • Calculation of Hastings-ratio using Green's method
    • Convergence
Lecture notes
10/17 Review session
10/19 Midterm (closed books)
10/24
Trees and tree models
    • Molecular clocks
    • Relaxed clock models
    • Birth-death process
Lecture notes

Assignment 7

10/26 Pairwise distance method
  • Additive tree methods
  • Minimal evolution
  • Genetic distance measures
  • Neighbor-Joining
Lecture notes

Software assignment 8

10/31
Population models
The coalescent I
    • The simple n-coalescent
    • Relation to population models
Lecture notes
11/02
The coalescent II
    • The structured coalescent
    • Coalescent and other population genetics forces
Lecture notes
11/07 The coalescent III: implementation
    • Maximum likelihood
    • Bayesian inference
Lecture notes

Assignment 9

11/09 The coalescent and phylogeny
    • Speciation
    • Estimation of speciation time
Lecture notes
11/14 Phylogeography
    • Haplotype network
    • Statistical phylogeography
Lecture notes
11/16 Historical biogeography
    • General overview of problems
    • Parsimony reconstruction of vicariance scenarios
    • Parsimony study of dispersal events
    • Statistical approaches
Lecture notes
11/21 Gene trees and species trees
    • Overview of the field
    • Parsimony methods
    • Statistical methods
Lecture notes
11/23 Johan Nylander: Model selection and model averaging I
    • Hierarchical and nonhierarchical models
    • Hierarchical likelihood ratio test
    • Akaike information criterion
    • Bayes information criterion
Lecture notes
11/28 Johan Nylander: Model selection and model averaging I
  • The Bayesian approach: Bayes factors
  • Model averaging
Lecture notes
11/30
Bootstrapping and jackknifing
    • Nonparametric and parametric approaches to confidence
    • Phylogenetic implementations
    • Bootstrap corrections
    • Bootstraps and posterior probabilities
Lecture notes
12/05
Statistical multiple sequence alignment
    • Dynamic programming
    • TKF model
    • Bayesian multiple alignments
Lecture notes
Final exam
 
Labs
  1. Parsing and printing trees (two week)
  2. Calculating the parsimony score of a tree (one week)
  3. Searching for the best tree (exhaustive, branch and bound)
  4. Searching for the best tree (heuristic)
  5. Simulate data on a tree (one week)
  6. Calculate likelihood of a tree (one week)
  7. Optimization of branch length in maximum likelihood context
  8. MCMC sampling of phylogenetic models
  9. Simulate a coalescent tree (one week)
  10. Individual project (four weeks)
  11. Oral presentation of individual project