Distance models or molecular evolution models for Amino Acid sequences

12.3.4 Distance models or molecular evolution models for Amino Acid sequences

The evolutionary distance between two amino acid sequences can be determined under the assumptions of a particular model of amino acid substitution. The substitution model deﬁnes a rate matrix that can be used to calculate the probability of evolving from one amino acid to another over a given time.

As with nucleotides, gaps are not penalized when using the Geneious Tree Builder. Sites with gaps are ignored when calculating pairwise distances (i.e., gaps are not treated as a 21^st amino acid state).

Jukes-Cantor

This is the simplest substitution model. It assumes that all amino acids have the same equilibrium base frequency, i.e., each amino acid occurs with a frequency of 0.05 in protein sequences. This model also assumes that all amino acid substitutions occur at equal rates.

If the proportion of non-gap, non-ambiguous sites that are mismatched between the sequences is given as p, the formula for computing the distance between the sequences is:

d = − 19
20 ∗ log(1 − 20
19 ∗ p)

Under Jukes-Cantor the number of substitutions is assumed to be Poisson distributed with a rate of 20
19 u, i.e., the probability of no substitutions at a given site over a branch of length ut is e^−ut.

Technically, Jukes-Cantor for amino acid sequences is the Neyman model (Neyman 1971) with 20 states.

< Prev Next > Up