Genetic Distance

Introduction

One of these days we'll have a history of the development of genetic distance measures--but not yet. Goldstein and Pollock's review provides a comprehensive look at current developments. The distance measures discussed below can be calculated for microsatellite data using the microsat program.

Distance Measures

Delta mu
The delta mu genetic distance (Ddm) for microsatellites (1), and the closely related D1 (average square distance) (2), were derived employing the analytical theory developed by Moran (3) for the distribution of alleles mutating under a strict stepwise mutation process in a population of finite constant size with non-overlapping generations. Although initially developed for the analysis of protein data this model seems fairly appropriate for microsatellite data. Moran showed that, under no constraint in allelic state, this distribution wanders unbounded but has a constant variance.

Using this model and taking as a random variable the squared difference in number of repeats between alleles of two populations A and B, we showed that its average has a linear expectation over time. Thus we defined the average square distance between two populations at a given locus as:

D1 = sum over alleles i and i' of ((i-i')² ni ni')

where ni is the number of alleles with i repeats (the prime indicating another population), and

E[D1(t)] = 2 (2N-1) m + Tau 2 m	[1] 

where m is the mutation rate and t and Tau time in generations. Thus, the distance is linear with a slope equal to twice the mutation rate.

This distance, and its family of related distances, was shown to be superior to other distances for microsatellites in that it is the only one with a linear expectation with time making it, in principle, a good distance for evolutionary studies. It presents however the drawback that it has a large variance, implying that it is useful for phylogenetic reconstruction only when a large amount of data is available and for taxa that have been separate for a substantial amount of time. A practical complication is that the time span one can explore with microsatellite data is limited by the apparent constraint in allele length variation that has been observed (4). An approximate estimate of the time to linearity of D1 can be obtained from:

Tau(R) = ((R²-1)/6) / 2m			[2] 

where R is the allele range, which can be conservatively estimated as the difference between the maximal and minimal allele size observed.

In order to improve on D1 the delta mu measure of distance (Ddm) for microsatellites was derived (1). This distance concentrates not on the mean squared difference but on the squared mean difference between alleles of two populations. Thus:

Ddm = (µ(A)-µ(B))²				[3] 

where µ(A) is the mean allele size for population A. It was shown (2) that this distance is also linear with time with expectation:

E[Ddm(t)] = 2ßµ					[4] 

where the mutation rate m=ß/2. Furthermore, this distance has a smaller variance than D1. More importantly and unlike D1, in populations at mutation-drift equilibrium, delta mu is independent of population size (compare equations 1 and 4). This allows direct estimates of time since population separation to be made without additional parameter estimates, provided one has an estimate of the mutation rate. This distance was derived by noting that:

D1 = V(A) + V(B) + (µ(A)-µ(B))²

where V(A) represents the variance in repeat number within population A. This variance is dependent on the population size and is constant once the population is at mutation-drift equilibrium but contributes to inflate the variance of D1.

Other distance measures

References

  1. Goldstein, D.B., Ruíz-Linares, A., Feldman, M. and Cavalli-Sforza, L.L. (1995) Genetic absolute dating based on microsatellites and the origin of modern humans. Proceedings of the National Academy of Sciences USA, 92:6720-6727.
  2. Goldstein, D.B., Ruíz-Linares, A., Feldman, M. and Cavalli-Sforza, L.L. (1995) An evaluation of genetic distances for use with microsatellite loci. Genetics, 139:463-471.
  3. Moran, P.A.P. (1975) Wandering distributions and the electrophoretic profile (1975) Theoretical Population Biology, 8:318-330.
  4. Bowcock, A.M., Ruíz-Linares, A., Tomfohrde, J., Minch, E., Kidd, J.R., Cavalli-Sforza, L.L. (1994) High resolution human evolutionary trees with polymorphic microsatellites. Nature, 368:455-457.
  5. Cavalli-Sforza, L.L., and Bodmer, W.F. (1971) The Genetics of Human Populations, p. 399, San Francisco, W.H. Freeman and Company.
  6. Dubois, D., and Prade, H. (1980) Fuzzy Sets and Systems: Theory and Applications, p. 24, New York, Academic Press.
  7. Wright, S. (1969) Evolution and the Genetics of Populations, vol. 2, p. 295, Chicago, University of Chicago Press.
  8. Slatkin, M. (1995) A measure of population subdivision based on microsatellite allele frequencies. Genetics, 139:457-462.
  9. Nei, M. (1972) Genetic distance between populations. American Naturalist, 106:283-292.
  10. Nei, M. (1978) Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics, 89:583-590.
  11. Weber, W. and Wong, C. (1993) Human Molecular Genetics, 2:1123-1128.
  12. Reynolds, J., Weir, B.S., and Cockerham, C.C. (1983) Estimation of the coancestry coefficient: basis for a short-term genetic distance. Genetics, 105:767-779.
  13. Shriver, M.D., Jin, L., Boerwinkle, E., Deka, R., Ferrell, R.E., and Chakraborty, R. (1995) A novel measure of genetic distance for highly polymorphic tandem repeat loci. Mol. Biol. Evol., 12(5):914-920.
  14. Nei, M. (1978) Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics, 89:538-590.
Please email comments to the authors of this document: Eric Minch, Andres Ruíz-Linares, or David Goldstein.

Back to Cavalli Lab