Variational inference for Bayesian variable selection in
MATLAB
Introduction
This software package is a MATLAB implementation of the variational
inference procedure for Bayesian variable selection, as described in a
forthcoming Bayesian Analysis
paper by Peter Carbonetto and Matthew Stephens, "Scalable
variational inference for Bayesian variable selection in regression,
and its accuracy in genetic association studies." This software
has been used to implement Bayesian variable selection for large
problems with over a million variables.
All the MATLAB functions we have implemented for this project have
been tested in version 7.10 (R2010a) of MATLAB for 64-bit Linux.
An implementation for R is
forthcoming.
For details on the software license, and how to install and use the
MATLAB functions, continue reading. If you have any questions,
comments or bugs to report, please contact the author.
Peter Carbonetto
Dept. of Human Genetics
University of Chicago
License

Variational inference for Bayesian variable
selection by Peter
Carbonetto is licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Installation
You can download the MATLAB code for this project here.
Once you have extracted the files from the compressed tar archive,
you will need to compile the C++ code into the MATLAB executables (MEX
files). To do this, run the install.m script. Before
doing this, you will need to configure MATLAB on your computer to
compile MEX files (if you haven't done so already). For details on
setting up MATLAB for MEX files, see the MathWorks website, including this
tutorial.
Overview of functions
We implemented over 70 MATLAB functions for this project, but to
run variational inference for Bayesian variable selection you will
only need to learn a few of them. The main functions are as follows:
- multisnp runs variational inference (the "inner
loop") given values for the hyperparameters of the model. This is
for variable selection in linear regression. We call this function
"multisnp" because we used it for joint analysis of single
nucleotide polymorphisms (SNPs) in a genome-wide association study,
but it suitable for any problem framed as variable selection in linear
regression.
- multisnpbin is the same as multisnp, except that it
is meant for variable selection in logistic regression. This is useful
for modeling a binary-valued outcome, such as case-control status.
- multisnpsim demonstrates how to run the full variational
inference procedure for Bayesian variable selection in linear
regression. It runs both the "inner" and "outer"
loops of the inference algorithm, where the inner loop executes the
coordinate ascent updates for a given value of the hyperparameters,
and the outer loop runs importance sampling for the hyperparameters.
This is the variational inference procedure used in the two simulation
studies for the Bayesian Analysis paper. This function assumes
specific choices for priors on the hyperparameters, as described in
the paper.
To learn how to use functions multisnp and
multisnpsim, we recommend looking carefully at script
example1.m. This script shows how the variational approximation
can be used in combination with importance sampling for an artificial
problem with 1000 variables. This script implements a single trial of
the first simulation study (the "ideal case") in the
Bayesian Analysis paper.
Note that our implementation of variational inference assumes the
specific priors for the regression coefficients and the indicator
variables. These priors are described in the Bayesian Analysis
paper. However, you have (nearly) complete freedom to choose the
priors for the hyperparameters.
To get more details on any of the functions included in this
package, you can always use the
help command in MATLAB.
|