Q: How does PHASE perform its test of association between
haplotypes and case-control status?
The test can be thought of as
a permutation-based likelihood ratio test. Assume for convenience
that the haplotypes of all individuals were observed, rather than
estimated, and consider the likelihood ratio (LR)
LR =
Pr(case haplotype data) Pr(control haplotype data)
--------------------------------------------------
Pr(case and control haplotype data combined)
where the denominator represents the probability of the haplotype
data computed under the null hypothesis that they came from a single
homogenous group, and the numerator represents the probability of the
haplotype data under the alternative hypothesis that the case and
control groups differ. Large values of LR are thus evidence towards
the alternative hypothesis. The standard approach is to condition on
the estimated haplotype frequencies in computing these
probabilities. However, this approach is well known to have little
power when there are many infrequent haplotypes. To avoid this problem
we take an alternative approach, whereby the probabilities in LR are
computed using the PAC model from Li and Stephens (2003). This model
takes into account similarity of observed haplotypes, and has the
property that LR will be large if haplotypes in the case groups are
more similar to one another than to haplotypes in the control
group. As a result the approach retains power even when all haplotypes
in the sample are different. To allow for uncertainty in haplotype
estimates, we find the average value of LR over many plausible
estimates for the haplotypes. Finally, to assess significance of the
resulting value for LR we compute LR in the same way for different
permutations of the case-control labels. The proportion of
permutations (including the identity permutation corresponding to the
true case-control labels) that give average values of LR greater than
or equal to the value obtained for the true case-control labels is the
significance probability reported in the _signif file. Note that the
smallest p-value attainable by this procedure is 1 divided by the
number of permutations specified (default 1/100).
The following example, while artificial, illustrates the rationale
behind our test. Consider the following sets of case and control
haplotypes (with the two alleles at each locus denoted 0/1).