This C++ package implements the methods described in the article by Flutre et al. The software detects quantitative trait loci for gene expression levels ("eQTLs") jointly in multiple subgroups (e.g. multiple tissues). See here to get more information, and to download the software.
The collection of R and C code implements the hidden Markov models described in Fu et al. (2012). Click here to download the zip file containing the source code. These models estimate several properties, such as the level of processivity and preference for hemimethylated CpG dyads over unmethylated ones, of DNA methyltransferases from double-stranded binary methylation data. The inference is done by Markov chain Monte Carlo methods under a Bayesian framework. The zip file also includes several in vivo data sets collected at three genes, FMR1, G6PD and LEP, as well as existing in vitro data sets in the literature.
This software is distributed under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or any later version. For details, see the LICENSE.txt file included with this software package.
This is a MATLAB implementation of the variational inference method for Bayesian variable selection described in a forthcoming Bayesian Analysis paper. See here to get more information, and to download the software.
GEMMA is the software implementing the Genome-wide Efficient Mixed Model Association algorithm for a standard linear mixed model and some of its close relatives for genome-wide association studies (GWAS). It fits a standard linear mixed model (LMM) to account for population stratification and sample structure for single marker association tests. It fits a Bayesian sparse linear mixed model (BSLMM) using Markov chain Monte Carlo (MCMC) for estimating the proportion of variance in phenotypes explained (PVE) by typed genotypes (i.e. chip heritability), predicting phenotypes, and identifying associated markers by jointly modeling all markers while controlling for population structure. It is computationally efficient for large scale GWAS and uses freely available open-source numerical libraries.
See here for the software.
BRIdGE implements a Bayesian approach for identifying gene-environment interactions when paired phenotypic measurements are taken under two environmental conditions. This method explicitly considers specific interaction models, while taking into account both sample pairing and the intra-individual correlation of measurements under the two conditions. Details are given in the following publication:
See here for software.
The software piMASS (Posterior inference using Model Averaging and Subset Selection), written and maintained by Yongtao Guan, implements MCMC-based inference methods for Bayesian variable-selection regression described in Guan and Stephens (2011)
This software was developed to perform multi-SNP association analysis for large (genome-wide) datasets, although it can also be applied to smaller association analysis data (e.g. candidate genes or regions), and in this case it forms an alternative to the multi-SNP association analysis capabilities of BIMBAM (below). It may also be useful for Bayesian variable selection regression in large-scale problems more generally.
This software uses ECME to compute a sparse, low-rank matrix factorization for a given matrix, as described in:
Engelhardt BE, Stephens M (2010) "Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis." PLoS Genetics 6(9):e1001117.
Download C++ code and instructions for SFA 1.0 and further documentation for the SFA model.
The program BIMBAM implements methods for assocation mapping, based on those described in
Servin, B and Stephens, M (2007). Imputation-based analysis of association studies: candidate genes and quantitative traits. PLoS Genetics, 2007.
BIMBAM can handle both large association studies (e.g., genome scans) and smaller studies of candidate genes/regions.
The software is distributed under the GNU Public License (GPL). To register and download, go here.
The program fastPHASE implements methods described in
Scheet, P and Stephens, M (2006). A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet
fastPHASE can handle larger data-sets than PHASE (e.g., hundreds of thousands of markers in thousands of individuals), but does not provide estimates of recombination rates. Our experiments suggest that haplotype estimates are slightly less accurate than from PHASE, but missing genotype estimates appear to be similar or even slightly better than PHASE.
The software is free for non-commercial use, and may be licensed for commercial use. To view the terms and conditions, and then proceed to download, click here.
The program PHASE implements methods for estimating haplotypes from population genotype data described in
Stephens, M., and Donnelly, P. (2003). A comparison of Bayesian methods for haplotype reconstruction from population genotype data. American Journal of Human Genetics, 73:1162-1169.
Stephens, M., Smith, N., and Donnelly, P. (2001). A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics, 68, 978--989.
Stephens, M., and Scheet, P. (2005). Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-Data Imputation. American Journal of Human Genetics, 76:449-462.
The software also incorporates methods for estimating recombination rates, and identifying recombination hotspots:
Crawford et al (2004). Evidence for substantial fine-scale variation in recombination rates across the human genome. Nature Genetics,.
The software is free for non-commercial use, and may be licensed for commercial use. To view the terms and conditions, and then proceed to download, click here.
Instructions for PHASE are included on the download site, or are also available here.
The program SCAT (Smoothed and Continuous AssignmenTs) implements a Bayesian statistical method for estimating allele frequencies and assigning samples of unknown (or known) origin across a continuous range of locations, based on genotypes collected at distinct sampling locations. In brief, the idea is to assume that allele frequencies vary smoothly in the study region, so allele frequencies are estimated at any given location using observed genotypes at near-by sampling locations, with data at the nearest sampling locations being given greatest weight. Details are given in
S K Wasser, A M Shedlock, K Comstock, E A Ostrander, B Mutayoba, and M Stephens. Assigning African elephant DNA to geographic region of origin: applications to the ivory trade. Proc Natl Acad Sci U S A, 41:14844-14852, 2004.
SCAT is available here.
N Li and M Stephens. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics, 165(4)2213-2233, 2003.
It is available free from here.
Please direct comments and questions regarding HOTSPOTTER to Na Li, at wuolong SPAMBLOCKER AT gmail.com