This package is a set of Matlab functions for chain-structured conditional random fields (CRFs) with categorical features. The code implements decoding (with the Viterbi algorithm), inference (with the forwards-backwards algorithm), sampling (with the forwards-filter bacwards-sample algorithm), and parameter estimation (with a limited-memory quasi-Newton algorithm) in these models. Several of the functions have been implemented in C as mex files to speed up calculations.

The functions use a data matrix **X** and a label vector **y**. Each element *i* in **y** gives the label at position *i*, among the values *0-k* (for some *k*). The label *0* is special, it indicates the position between 'sentences'. Each element *(i,j)* in the data matrix **X** gives the value of feature *j* at position *i*, among the values *0-f* (for some *f* that can be different for each feature). A feature value of *0* is special, it indicates that the feature is ignored at this position (this is used in natural language applications to represent words that don't occur very often). The features at positions with label values of *0* are ignored.

The model uses four sets of parameters:

**w**_{i,j,k}: The potential of state *j* given that feature *i* takes the value *k*.

**v**_{start,j}: The potential of state *j* to start a sentence.

**v**_{end,j}: The potential of state *j* to end a sentence.

**v**_{i,j}: The potential for the transition from state *i* to *j*.

The specific model for a single sentence of length *s* where each word has
*f* features is:

p(y_{1:s} | x_{1:s}, w, v_{start}, v_{end}, v) =
(1/Z)exp( &Sigma_{{i=1:s}}&Sigma_{{j=1:f}}[w_{j,yi,xi,j}] + v_{start,y1} + v_{end,ys} + &Sigma_{{i=1:s-1}}[v_{yi,i+1}] )

The value of *Z* is chosen so that the sum over assignment of
y_{1:s} is equal to one. To add a bias for each label, you can append a column of ones to **X**.

nFeatures = max(X);

[w,v_start,v_end,v] = crfChain_initWeights(nFeatures,nStates,'zeros');

featureStart = cumsum([1 nFeatures(1:end)]);

sentences = crfChain_initSentences(y);

We can make the potential functions, do decoding, do inference, and generate

yViterbi = crfChain_decode(nodePot,edgePot);

[nodeBel,edgeBel,logZ] = crfChain_infer(nodePot,edgePot);

samples = crfChain_sample(nodePot,edgePot,10);

We can compute the negative log-likelihood of a set of training sentences

Subsequently we can train the model using this as an objective function:

[w,v_start,v_end,v] = crfChain_splitWeights(wv,featureStart,nStates);

Note that crfChain contains many sub-directories that must be present on the Matlab path for the files to work. You can add these sub-directories to the Matlab path by typing (in Matlab) 'addpath(genpath(crfChain_dir))', where 'crfChain_dir' is the directory that the zip file is extracted to.

The mex files for several architectures are included in the archive. For systems where the mex files are not included, you can compile the mex files by calling the 'mexAll' function. The default Matlab compiler does not work on Windows, but you can use MinGW and Gnumex instead.

- M. Schmidt, K. Swersky.
**crfChain: Matlab code for chain-structured conditional random fields with categorical features**. http://www.cs.ubc.ca/~schmidtm/Software/crfChain.html, 2008.

- The 'Y' variable in UGM is a nSamples by nNodes matrix, and any graph structure can be assumed of the samples. In crfChain the graph structure is forced to a chain, and the 'Y' variable is a vector that concatenates all the chains (requring a '0' in the 'position' between the chains). By always requiring the graph structure to be a chain, crfChain saves on some of the overhead required to deal with the graph structure.
- In UGM, the features 'X' are assumed to be real-valued, whereas in crfChain they are assumed categorical. In UGM, you can simulate these categorical variables. If you want to simulate a categorical values that takes 's' states, simply use 's' binary indicator variables. By requiring that categorical features are used, crfChain allows a more efficient representation when the number of categories is very large.
- In crfChain there are special potentials for the start and the end of the chain. This can be simuilated in UGM by adding two extra indicator features, and UGM offers even more fleixbility to do things of this nature.
- In crfChain we assume a full node and edge potential parameterization and that the edge features only depend on the labels. In contrast, UGM allows you to share parameters within the potentials, as well as have non-bias features on the edges.
- In crfChain we assume that all nodes and all edges share the same parameter, whereas UGM gives you the option to allow different nodes/edges to have different parameters.

Mark Schmidt > Software > crfChain