cd CRF2D % wherever you put the zip file addpath(genpath(pwd)) % assumes matlab 7 demoSingle demoMultiIf you have a Mac, you may have to compile and put the following mex files onto your path: BP_GeneralComplex_C.mexmac, BP_General_C.mexmac MF_GeneralComplex_C.mexmac.

nodeFeatures(:,i,s) = [1, raw(:,r,c,s))] if expandNode=0 nodeFeatures(:,i,s) = [1, quadratic(raw(:,r,c,s))] if expandNode=1where i corresponds to node (r,c) and quadratic(v) is a quadratic expansion of the vector v. We can also create edge features using

egdeFeatures(:,e,s) = [1, |raw(:,r,c) - raw(:,r',c')| ] if expandEdge = 0 edgeFeatures(:,e,s) = [1, Quadratic(|raw(:,r,c) - raw(:,r',c')|) ] if expandEdge = 1where e corresponds to the egde connecting (r,c)-(r',c').

The node potentials are given by

nodepot(xi, i) = exp[ xi w'* nodeFeatures(:,i,s) ]where xi is the state (+1, -1) of node i, which is in row r, column c. Similiarly, the edge potentials for edge e (connecting nodes i, j) have the form

edgepot(xi,xj,e) = exp[ xi xj v' * edgeFeatures(:,e) ]

The gradient is the observed minus the expected features:

gradNode(:,i,s) = label(i,s)* nodeFeatures(:,i,s) - sum_x bel(x,i,s) * x * nodeFeatures(:,i,s)where bel(x,i,s) = P(X(i)=x|s) is the marginal belief for case s. Similarly,

gradEdge(:,e,s) = label(i,s)*label(j,s)* edgeFeatures(:,e,s) - sum_{x,x'} bel2(x,x',e,s) * x * x' * edgeFeatures(:,e,s)where bel2(x,x',e,s) = P(X(i)=x,X(j)=x'|s) is the pairwise belief and edge e connects i and j.

You may also be interested in some matlab/C code for inference in undirected MRFs (with pairwise potentials) by Talya Meltzer.

label = sign(double(imread('X.PNG'))-1); label=label(:,:,1); noisy = label+randn(32,32); imshow([label noisy]); title('Labels Observed');

We will use the left image as the class labels (-1 or 1), and the right image as the features at each pixel location. We now want to put this in the appropriate data structure, which we will call 'trainingdata':

featureEng = latticeFeatures(0,0); features = permute(noisy,[4 1 2 3]); traindata.nodeFeatures = mkNodeFeatures(featureEng,features); traindata.edgeFeatures = mkEdgeFeatures(featureEng,features); traindata.nodeLabels = label; traindata.ncases = 1; trainNdx = 1:traindata.ncases;We first initialize our feature engine object. We have specified (0,0) since we want to use our features directly, as opposed to performing a quadratic basis function expansion. The 'mkNodeFeatures' and 'mkEdgeFeatures' functions put our extracted features in the appropriate format, using the parameters of the feature engine. A location

Before training the model, we select an initial set of parameters (using small random values). Since the objective function is convex, the initial values should not matter very much, but they can affect the numerical stability of approximate inference.

nNodeFeatures = size(traindata.nodeFeatures,1); nEdgeFeatures = size(traindata.edgeFeatures,1); weights = initWeights(featureEng,nNodeFeatures,nEdgeFeatures);Finally, we will try doing inference with our initial parameters (using the mean field approximation for inference):

nstates = 2; infEng = latticeInferMF(nstates); caseNum = 1; featureEng = enterEvidence(featureEng, traindata, caseNum); % store data [nodePot, edgePot] = mkPotentials(featureEng, weights); [nodeBel, MAPlabels] = infer(infEng, nodePot, edgePot); imshow(MAPlabels);Above, we first created a mean field inference engine. We then entered case 1 of our training data into the feature engine. The feature engine subsequently lets us make the node and edge potentials, needed to perform inference with the inference engine. The inference engine returns the node beliefs (marginal probabilities) and assigns labels. Inference with random parameters tends to be a sub-optimal labeling strategy, below are some examples of running this demo:

reg = 1; maxIter = 10; options = optimset('Display','iter','Diagnostics','off','GradObj','on',... 'LargeScale','off','MaxFunEval',maxIter);We specify the pseudolikelihood objective and gradient function, and set up the associated arguments:

gradFunc = @pseudoLikGradient; gradArgs = {featureEng,traindata,reg};And we optimize the objective function:

winit = initWeights(featureEng, nNodeFeatures,nEdgeFeatures); weights = fminunc(gradFunc,winit,options,trainNdx,gradArgs{:});We can plot the results as before

[nodePot, edgePot] = mkPotentials(featureEng, weights); [nodeBel, MAPlabels] = infer(infEng, nodePot, edgePot); imshow(MAPlabels);For this data set, the optimal pseudolikelihood parameters do not prove to be very interesting (except to show that pseudolikelihood tends to over-smooth, in this case to an all-white image):

infEng = latticeInferMF(nstates); gradFunc = @scrfGradient; gradArgs = {featureEng, infEng, traindata, reg}; weights = fminunc(gradFunc,winit,options,trainNdx,gradArgs{:});The results are much better.

infEng = latticeInferBP(nstates); gradFunc = @scrfGradient; gradArgs = {featureEng, infEng, traindata, reg}; weights = fminunc(gradFunc,winit,options,trainNdx,gradArgs{:});Example results of an LBP trained model with LBP inference are shown below:

imshow(medfilt2(noisy)>0)This gives poor results in this case.

reg = 1; maxIter = 3; options = optimset('Display','iter','Diagnostics','off','GradObj','on',... 'LargeScale','off','MaxFunEval',maxIter); gradFunc = @scrfGradient; gradArgs = {featureEng, infEng, traindata, reg}; weights = fminunc(gradFunc,winit,options,trainNdx,gradArgs{:});After processing each of the 50 training images 3 times, the batch training procedure does not make significant progress, given the redundency in the data:

eta = 0.0001; batch_size = 1; anneal = 0; tau = 0;Training follows a similar form to the batch scenario:

weights = stochgrad(gradFunc,winit,trainNdx,'gradArgs',gradArgs,... 'maxIter',maxIter,'eta',eta,'batch_size',batch_size);This demo uses only 1 iteration, so training is on-line. Some result images are shown below:

eta0 = 0.0001; mu = 0.001; lambda = 0.9;Training is done using the following:

weights = smd(gradFunc,winit,trainNdx,'gradArgs',gradArgs,... 'maxIter',maxIter,'eta0',eta0,'batch_size',batch_size,... 'lambda', lambda, 'mu', mu);In this data, we have a very small number of features, so SMD does not give a substantial improvement. Below are some example results for the on-line training scenario after viewing all training images: