crfChain

Mark Schmidt, Kevin Swersky (2008)

This package is a set of Matlab functions for chain-structured conditional random fields (CRFs) with categorical features. The code implements decoding (with the Viterbi algorithm), inference (with the forwards-backwards algorithm), sampling (with the forwards-filter bacwards-sample algorithm), and parameter estimation (with a limited-memory quasi-Newton algorithm) in these models. Several of the functions have been implemented in C as mex files to speed up calculations.

The functions use a data matrix X and a label vector y. Each element i in y gives the label at position i, among the values 0-k (for some k). The label 0 is special, it indicates the position between 'sentences'. Each element (i,j) in the data matrix X gives the value of feature j at position i, among the values 0-f (for some f that can be different for each feature). A feature value of 0 is special, it indicates that the feature is ignored at this position (this is used in natural language applications to represent words that don't occur very often). The features at positions with label values of 0 are ignored.

The model uses four sets of parameters:
wi,j,k: The potential of state j given that feature i takes the value k.
vstart,j: The potential of state j to start a sentence.
vend,j: The potential of state j to end a sentence.
vi,j: The potential for the transition from state i to j.

The specific model for a single sentence of length s where each word has f features is:

p(y1:s | x1:s, w, vstart, vend, v) = (1/Z)exp( &Sigma{i=1:s}&Sigma{j=1:f}[wj,yi,xi,j] + vstart,y1 + vend,ys + &Sigma{i=1:s-1}[vyi,i+1] )

The value of Z is chosen so that the sum over assignment of y1:s is equal to one. To add a bias for each label, you can append a column of ones to X.

Usage

Given X and y in the appropriate format, the data structures used by the code can be initialized using:

nStates = max(y);
nFeatures = max(X);
[w,v_start,v_end,v] = crfChain_initWeights(nFeatures,nStates,'zeros');
featureStart = cumsum([1 nFeatures(1:end)]);
sentences = crfChain_initSentences(y);

We can make the potential functions, do decoding, do inference, and generate 10 conditional samples for a sentence s using:

[nodePot,edgePot]=crfChain_makePotentials(X,w,v_start,v_end,v,nFeatures,featureStart,sentences,s);
yViterbi = crfChain_decode(nodePot,edgePot);
[nodeBel,edgeBel,logZ] = crfChain_infer(nodePot,edgePot);
samples = crfChain_sample(nodePot,edgePot,10);

We can compute the negative log-likelihood of a set of training sentences trainNdx using:

crfChain_loss([w(:);v_start;v_end;v(:)],X,y,nStates,nFeatures,featureStart,sentences(trainNdx,:));

Subsequently we can train the model using this as an objective function:

[wv] = minFunc(@crfChain_loss,[w(:);v_start;v_end;v(:)],[],X,y,nStates,nFeatures,featureStart,sentences(trainNdx,:));
[w,v_start,v_end,v] = crfChain_splitWeights(wv,featureStart,nStates);

Example

Running 'example_crfChain' runs an example that illustrates the usage of crfChain, using a synthetic data set.

Download

The complete set of .m and .c files are available here.

Note that crfChain contains many sub-directories that must be present on the Matlab path for the files to work. You can add these sub-directories to the Matlab path by typing (in Matlab) 'addpath(genpath(crfChain_dir))', where 'crfChain_dir' is the directory that the zip file is extracted to.

The mex files for several architectures are included in the archive. For systems where the mex files are not included, you can compile the mex files by calling the 'mexAll' function. The default Matlab compiler does not work on Windows, but you can use MinGW and Gnumex instead.

Citations

If you use this software in a publication, please cite the work using the following information:

Using crfChain with L1-Regularization

A demo of using the crfChain code with the L1General code to do L1-regularized parameter estimation (to encourage setting parameters to exactly 0) is available here.

Faster training with Stochastic Average Gradient

The SAG4CRF software contains several methods that can be substantially faster than L-BFGS for training CRFs.

More general features/structures

The UGM package has support for CRFs with continuous features, features on the edges, parameter tieing, and general graph structures. The specific differences between UGM and crfChain are as follows:
Mark Schmidt > Software > crfChain