PHASE: A program for reconstructing haplotypes from population data
Most recent version information:
The most recent version of PHASE is v2.1.1, which features improved mixing for larger data sets compared with v2.1, as well as perfoming some (very) rudimentary checking of the input file format. (For many data sets v2.1 and v2.1.1 will give very similar answers, but for some data sets the results from v2.1.1 can be considerably better.)
The most recent major release was v2.1, which introduced more flexible models for recombination (including recombination hotspots) and fixed a couple of bugs in v2.0.2.
DescriptionPHASE v 2.1 is a program implementing the method for reconstructing haplotypes from population data, described in
 Stephens, M., Smith, N., and Donnelly, P. (2001). A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics, 68, 978--989.
 Stephens, M., and Donnelly, P. (2003). A comparison of Bayesian methods for haplotype reconstruction from population genotype data. American Journal of Human Genetics, 73:1162-1169.
 Stephens, M., and Scheet, P. (2005). Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-Data Imputation. American Journal of Human Genetics, 76:449-462.
The software also incorporates methods for estimating recombination rates, and identifying recombination hotspots, as described in
 Li, N., and Stephens, M. (2003). Modelling Linkage Disequilibrium, and identifying recombination hotspots using SNP data Genetics, 165:2213-2233.
 Crawford et al (2004). Evidence for substantial fine-scale variation in recombination rates across the human genome. Nature Genetics, 36: 700-706.
How to cite this software
Please see user instructions for how to cite the software.
In all cases, please specify the version of the software used, and any deviations from the default options.
PHASE is available under the following open source license.
I distribute executables of version 2.1.1 for Linux and Microsoft Windows. Contributed executables may be available for other platforms (see below). Source code (C++) is also available (below) for those who wish to compile it on other operating systems. If you do manage to compile the program successfully on another platform, and wish to contribute an executable for others to use, please email me the executable, and I will add it to this website.
Executables for v2.1.1
PHASE source code is available here.
Supplementary filesAll these files are supplied with the linux executable, but may not be supplied with other executables, so here they are for separate download if you need them. (You should be able to save these files on your computer by right-clicking on each of them, one at a time.)
Results and simulated data files from the paperThe following resources are provided for the convenience of researchers who wish to do comparisons on the data sets used to produce figures 2 and 3 in the paper .
Summary of results from Figures 2 and 3
Here is a text file of the results shown in Figures 2 and 3 in the paper.
Here is a .zip archive containing files of simulated data used for the manuscript by Stephens, Smith and Donnelly. (If you are using UNIX you should be able to extract the files using unzip truthfiles.zip.)
The archive contains files with names of the following forms:
Here is a .zip archive containing files of results from our method. The names of the files are similar to those described above: hopefully it is obvious which results correspond to which datasets. Each row of each file contains the results for a single data set. Columns 32,33, and 37 contain the important quantities: column 32 is the number of ambiguous individuals in the data set, 33 is the number of these that our method got wrong (so the average of 32/33 over data sets is the "error rate" from our paper), and 37 contains the I_F score for our method. Rows that contain mostly NAs correspond to data sets with more possible haplotypes than our implementation of EM could cope with - the results from these data sets were ignored in our analyses.
|UW - Statistics|