12.3.3 Distance models or molecular evolution models for DNA sequences

The evolutionary distance between two DNA sequences can be determined under the assumption of a particular model of nucleotide substitution. The parameters of the substitution model define a rate matrix that can be used to calculate the probability of evolving from one base to another in a given period of time. This section briefly discusses some of the substitution models available for the Geneious tree builder. Most models are variations of two sets of parameters – the equilibrium frequencies and relative substitution rates.

Equilibrium frequencies refer to the background probability of each of the four bases A, C, G, T in the DNA sequences. This is represented as a vector of four probabilities πACGT that sum to 1.

Relative substitution rates define the rate at which each of the transitions (A G, C T ) and transversions (A C, A T , C G, G T ) occur in an evolving sequence. It is represented as a 4x4 matrix with rates for substitutions from every base to every other base.

Additionally, gaps are not penalized when using the Geneious Tree Builder. Sites with gaps are ignored when calculating pairwise distances (i.e, gaps are not treated as a fifth nucleotide state). Similarly, sites with ambiguous nucleotides are always ignored in distance calculations.

Jukes-Cantor

This is the simplest substitution model. It assumes that all bases have the same equilibrium base frequency, i.e., each nucleotide base occurs with a frequency of 0.25 in DNA sequences. This model also assumes that all nucleotide substitutions occur at equal rates (see Jukes and Cantor 1969).

If the proportion of non-gap, non-ambiguous sites that are mismatched between the sequences is given as p, the formula for computing the distance between the sequences is:

d = 3
4 log(1 4
3 p)

Under Jukes-Cantor, the number of substitutions is assumed to be Poisson distributed with a rate of 4
3u, i.e. the probability of no substitutions at a given site over a branch of length ut is e4
3ut.

HKY

The HKY model assumes every base has a different equilibrium base frequency, and also assumes that transitions evolve at a different rate to the transversions (see Hasegawa et al 1985).

Tamura-Nei

This model also assumes different equilibrium base frequencies. In addition to distinguishing between transitions and transversions, it also allows the two types of transitions (A G and C T ) to have different rates (see Tamura & Nei 1993).