Ascertainment Bias

Rubinsztein et al (1995), in a study of the average sizes of microsatellite loci in humans and other primates, reported a significant bias toward greater length in humans. Based on this they suggested the that there i s an inherent difference between human microsatellites and those in other primates, perhaps having to do with the degree of asymmetry in the mutation process. Ellegren et al (1995), however, pointed out that the length d ifferences could be due to ascertainment bias: microsatellites tend to be selected in a focal species (the species in which the microsatellites were first developed) to be either polymorphic or long. Since length and polymorphism are positively correlated (see above), both criteria result in loci longer than average. Amos and Rubinsztein (1995) defended their initial interpretation with a number of novel statistical approaches, but the issue of ascertainment bias itself has been largely dropped as the original participants in the debate moved on to detailed characterizations of asymmetry in the mutation distribution at particular loci a specific taxon. This is unfortunate because whatever the level of asymmetry, and despite assertions to the contrary by Amos et al (1995), ascertainment bias will influence all interspecific comparisons and must be carefully taken into account.

In fact it is straightforward to make a quantitative assessment of ascertainrnent bias. Imagine that in a focal species, microsatellites are selected from a pool of loci with a range of R (that is, alleles may have any number of repeats from 1 to R). For convenience, we will refer to the average length of alleles at a locus (in one taxon) as the length of that locus. Assume that in the focal species only microsatellites longer than C repeat units are accepted for subsequent analysis (the cutoff being impo sed directly by a preference for clones with long alleles, indirectly by the screening process, or by a preference for polyrnorphic markers). On average the selection process in the focal species results in microsatellites of length (R+C)/2. In a related but sufficiently diverged species the average length at the same loci would be R/2. If the difference in length due to ascertainment bias is denoted Da, then we have Da = (R+C)/2-R/2 = C/2. The magnitude of the difference is therefore independent of R. Th is argument could be refined by taking account of various complications (especially correlations in size between the focal and related species), but the point is already clear: the absolute bias is substantial, and for moderate R it is substantial as a fr action of R. It is especially interesting to note that it is customary to focus on microsatellites with 10 or more repeats, as these are often polymorphic. Then C=10 and we predict that humans would have, on average, five more repeats than other primates, in striking agreement with the reported difference of four repeats between humans and chimpanzees. Thus, once ascertainment bias is taken into account we see that in fact there is nothing to explain with respect to the difference in average leng th between human and other primate microsatellites reported by Rubinsztein et al (1995).

The point here is not to further belabor the argument of whether the differences between humans and other primates reflects some inherent, "directional" difference as claimed by Rubinsztein et al (1995). That argument should (and certainly will) be settled by comparing microsatellites first selected in other primates with those first selected in humans. The point is rather to demonstrate that differences in the average length between species are expected whenever micr osatellites selected in one species are carried over to another. Since length and variability are correlated, this difference imposes a bias in the variability expected in the focal and related species. Moreover, an additional contribution to such bias ar ises from the preference for pure stretches of repeats in the focal species. Even in closely related species these stretches will often be interrupted by imperfections (Crouau-Roy et al. 1996, Garza and Freimer 1996). Since imperfections are known to stabilize microsatellites, this difference will further the contribution that ascertaimnent bias will make to the differences between species in variability at microsatellite loci. For these reasons it is critical that sets of microsatellites with consistent structures be used to calculate genetic distances, and especially to compare variabilities among taxa.