This is a set of Matlab routines I wrote for the course STAT535D: Statistical
Computing and Monte Carlo Methods
by A. Doucet. It implements
different Markov Chain Monte Carlo (MCMC) strategies for sampling from the
posterior distribution over the
parameter values for binary Probit and Logistic Regression models with a
Gaussian prior on the parameter values. Specifically, we are sampling from:

P(y=1|w,x) = f(w'x)

w ~ N(0,v)

In the above, **x** is a set of p features, **y** is the class label (-1
or 1), **w** is the parameters we want to estimate, **N(0,v)** denotes
the prior Normal distribution on **w** (with mean 0 and inverse
covariance matrix **v**). For Logistic Regression, **f(a)** is the
sigmoid function **1/(1+exp(-a))**, while for Probit Regression it is the
Gaussian cumulative distribution function.

The second Probit Regression sampling strategy (probit2Sample.m)
uses the same model, but
implements the Composition sampler of Holmes and Held ("Bayesian auxiliary
variable models for binary and polychotomous regression", 2004).
This model jointly samples **w** and **z**, by directly sampling **z**
from its marginal distribution (integrating over **w**).

**Logistic Regression**: There are 3 strategies implemented for sampling
from the Logistic model. The first strategy (logist2SampleMH.m) uses the
Metropolis-Hastings algorithm outlined in Johnson and Albert ("Ordinal Data
Modeling", Springer 1999). The Iteratively-Reweighted Least Squares
algorithm is used to find the Maximum a Posteriori (MAP) estimate of **w**, and this
value is used to initialize the Markov Chain. The Asymptotic Covariance Matrix
and an adaptively updated kernel width parameter are used to make proposals.

The 2nd strategy for the Logistic model (logist2Sample.m) is the Logistic
variant of the Holmes and Held Probit Regression sampler. Rather than having a
unit variance as in the Probit model, in the Logistic model the variances of
the **z** variables **lambda** are obtained in this model by sampling from a
Kolmogorv-Smirnov distribution. This block-Gibbs sampler updates **z** and
**w** jointly conditioned on **lambda** (as in the Probit model), then
samples **lambda** conditioned on **z** and **w**.

The 3rd strategy for the Logistic model (logist2Sample2.m) is the 2nd
block-Gibbs sampling strategy of Holmes and Held. In this second approach,
**z** and **lambda** are updated jointly given **w** (**z** is sampled
from a truncated Logistic distribution), then **w** is sampled conditioned
on **z** and **lambda**.

**Sparse Logistic Regression**: A 4th strategy was implemented for a
slightly different Logistic model (logist_FS_Sample.m). In this model, we
have an additional set of variables **gamma** that indicate whether a
variable is included in the model. The effect of this is that each sample
only depends on a subset of the variables, and sampling **gamma** lets us
examine a posterior distribution over whether each variable is 'relevant' to
the classification.
This function implements the method
described in Holmes and Held, which augments the 2nd Logistic strategy above
with reversible-jump trans-dimensional moves to update **gamma**.

s = *(X,y,v,nSamples)

(where * is one of the methods above, and the output will be nVariables by nSamples)

Also included is IRLS code that returns the MAP
estimate of the Logistic model (and optionally the Asymptotic Covariance matrix). This code has the
interface:

w = L2LogReg_IRLS(X,y,v)

The complete set of .m files are available here. The report for this class project is available here. Some of the samplers also use RANDRAW.

The blogreg package contains many sub-directories that must be present on the Matlab path for the files to work. You can add these sub-directories to the Matlab path by typing (in Matlab) 'addpath(genpath(blogerg_dir))', where 'blogreg_dir' is the directory that the zip file is extracted to.

Note that a bug in the *sampleLambda.m* function was fixed on July 18, 2013 (thanks to Sharon Chiang for pointing this out).

- M. Schmidt.
**Bayesian Logistic Regression through Auxiliary Variables**. CS535D Project Report, 2006.

Mark Schmidt > Software > blogreg