Variational inference for
Bayesian variable selection
in MATLAB

Introduction

This software package is a MATLAB implementation of the variational inference procedure for Bayesian variable selection, as described in a forthcoming Bayesian Analysis paper by Peter Carbonetto and Matthew Stephens, "Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies." This software has been used to implement Bayesian variable selection for large problems with over a million variables.

All the MATLAB functions we have implemented for this project have been tested in version 7.10 (R2010a) of MATLAB for 64-bit Linux.

An implementation for R is forthcoming.

For details on the software license, and how to install and use the MATLAB functions, continue reading. If you have any questions, comments or bugs to report, please contact the author.

Peter Carbonetto
Dept. of Human Genetics
University of Chicago

License

Creative Commons License

Variational inference for Bayesian variable selection
by Peter Carbonetto is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Installation

You can download the MATLAB code for this project here.

Once you have extracted the files from the compressed tar archive, you will need to compile the C++ code into the MATLAB executables (MEX files). To do this, run the install.m script. Before doing this, you will need to configure MATLAB on your computer to compile MEX files (if you haven't done so already). For details on setting up MATLAB for MEX files, see the MathWorks website, including this tutorial.

Overview of functions

We implemented over 70 MATLAB functions for this project, but to run variational inference for Bayesian variable selection you will only need to learn a few of them. The main functions are as follows:

  • multisnp runs variational inference (the "inner loop") given values for the hyperparameters of the model. This is for variable selection in linear regression. We call this function "multisnp" because we used it for joint analysis of single nucleotide polymorphisms (SNPs) in a genome-wide association study, but it suitable for any problem framed as variable selection in linear regression.
  • multisnpbin is the same as multisnp, except that it is meant for variable selection in logistic regression. This is useful for modeling a binary-valued outcome, such as case-control status.
  • multisnpsim demonstrates how to run the full variational inference procedure for Bayesian variable selection in linear regression. It runs both the "inner" and "outer" loops of the inference algorithm, where the inner loop executes the coordinate ascent updates for a given value of the hyperparameters, and the outer loop runs importance sampling for the hyperparameters. This is the variational inference procedure used in the two simulation studies for the Bayesian Analysis paper. This function assumes specific choices for priors on the hyperparameters, as described in the paper.

To learn how to use functions multisnp and multisnpsim, we recommend looking carefully at script example1.m. This script shows how the variational approximation can be used in combination with importance sampling for an artificial problem with 1000 variables. This script implements a single trial of the first simulation study (the "ideal case") in the Bayesian Analysis paper.

Note that our implementation of variational inference assumes the specific priors for the regression coefficients and the indicator variables. These priors are described in the Bayesian Analysis paper. However, you have (nearly) complete freedom to choose the priors for the hyperparameters.

To get more details on any of the functions included in this package, you can always use the help command in MATLAB.


September 6, 2011