load X.mat [nRows,nCols] = size(X); nNodes = nRows*nCols; nStates = 2; y = int32(1+X); y = reshape(y,[1 1 nNodes]);

X = X + randn(size(X))/2; X = reshape(X,1,1,nNodes);

adj = sparse(nNodes,nNodes); % Add Down Edges ind = 1:nNodes; exclude = sub2ind([nRows nCols],repmat(nRows,[1 nCols]),1:nCols); % No Down edge for last row ind = setdiff(ind,exclude); adj(sub2ind([nNodes nNodes],ind,ind+1)) = 1; % Add Right Edges ind = 1:nNodes; exclude = sub2ind([nRows nCols],1:nRows,repmat(nCols,[1 nRows])); % No right edge for last column ind = setdiff(ind,exclude); adj(sub2ind([nNodes nNodes],ind,ind+nRows)) = 1; % Add Up/Left Edges adj = adj+adj'; edgeStruct = UGM_makeEdgeStruct(adj,nStates); nEdges = edgeStruct.nEdges;

Unlike the previous demo, each node has different features. This is because each node will have features that depend on local properties of the image. This means that the features are different at different locations.% Add bias and Standardize Columns tied = 1; Xnode = [ones(1,1,nNodes) UGM_standardizeCols(X,tied)]; nNodeFeatures = size(Xnode,2); nodeMap = zeros(nNodes,nStates,nNodeFeatures,'int32'); for f = 1:nNodeFeatures nodeMap(:,1,f) = f; end

Note that unlike the previous demo where we only considered binary features, we now have continuous features. With continuous features, it often makes sense to do some sort of normalization of the feature values. For example, a common approach is to subtract off the mean of each feature, and normalize the values so that they have a standard deviation of 1. The function *UGM_standardizeCols* does this, transforming the features so that they have a mean of 0 and standard deviation of 1. Setting *tied* to 1 indicates that the features are to be standardized across nodes, as opposed to within nodes. Note that this standardization is optional, and in some applications it may not make sense to do it. Also, if you want to use a new data set *you will need to use the same normalization used when training* (and that the mean/variance of the new data may be different than the training data).

For this problem, we will use Ising-like edge potentials. For making the edge potentials, the bias should obviously be considered as a shared feature. However, the intensities of the two nodes on the edge are not a shared feature. So, we make our Xedge and edgeMap as follows:

Since one of the two node features is not shared, each edge will have a total of 3 features: the bias, the intensity of the first node, and the intensity of the second.% Make Xedge sharedFeatures = [1 0]; Xedge = UGM_makeEdgeFeatures(Xnode,edgeStruct.edgeEnds,sharedFeatures); nEdgeFeatures = size(Xedge,2); % Make edgeMap f = max(nodeMap(:)); edgeMap = zeros(nStates,nStates,nEdges,nEdgeFeatures,'int32'); for edgeFeat = 1:nEdgeFeatures edgeMap(1,1,:,edgeFeat) = f+edgeFeat; edgeMap(2,2,:,edgeFeat) = f+edgeFeat; end

- Besag (1975). Statistical analysis of non-lattice data. The Statistician.

nParams = max([nodeMap(:);edgeMap(:)]); w = zeros(nParams,1); funObj = @(w)UGM_CRF_PseudoNLL(w,Xnode,Xedge,y,nodeMap,edgeMap,edgeStruct); w = minFunc(funObj,w);

In this context we want to pick features that are likely to have an associative effect. In other words, the feature should be large if the nodes are likely to have the same value. The function *UGM_makeEdgeFeaturesInvAbsDif* implements one possible way to make edge features that might have an associative effect. This function assumes that shared features are non-negative, and for non-shared features it uses the reciprocal of 1 plus the absolute difference between the node features. It can be called as follows:

Since we have changed the edge features we must also change the edge map:sharedFeatures = [1 0]; Xedge = UGM_makeEdgeFeaturesInvAbsDif(Xnode,edgeStruct.edgeEnds,sharedFeatures); nEdgeFeatures = size(Xedge,2);

f = max(nodeMap(:)); edgeMap = zeros(nStates,nStates,nEdges,nEdgeFeatures,'int32'); for edgeFeat = 1:nEdgeFeatures edgeMap(1,1,:,edgeFeat) = f+edgeFeat; edgeMap(2,2,:,edgeFeat) = f+edgeFeat; end

nParams = max([nodeMap(:);edgeMap(:)]); w = zeros(nParams,1); funObj = @(w)UGM_CRF_PseudoNLL(w,Xnode,Xedge,y,nodeMap,edgeMap,edgeStruct); % Make objective with new Xedge/edgeMap UB = [inf;inf;inf;inf]; % No upper bound on parameters LB = [-inf;-inf;0;0]; % No lower bound on node parameters, edge parameters must be non-negative w = minConf_TMP(funObj,w,LB,UB);

We proposed trainig CRFs with non-negative edge features and bound-constrained optimization so that decoding is guaranteed to be possible with graph cuts in:

- Cobzas Schmidt (2009). Increased discrimination in level set methods with embedded conditional random fields. Computer Vision and Pattern Recognition.

w = zeros(nParams,1); funObj = @(w)UGM_CRF_NLL(w,Xnode,Xedge,y,nodeMap,edgeMap,edgeStruct,@UGM_Infer_LBP); w = minConf_TMP(funObj,w,LB,UB)

Also note that we could have replaced loopy belief propagation with a different variational inference method. In particular, tree-reweighted belief propagation gives a convex upper bound on the negative log-likelihood function, so it leads to an optimal upper bound on the exact negative log-likelihood. In contrast, pseudo-likelihood is convex but isn't an upper bound, while loopy belief propagation isn't convex and doesn't give an upper bound.

It is also possible to enforce the associative condition with more complicated parameterizations of the edge potentials. In particular, we can enforce that the diagonal elements are positive and the non-diagonal elements are negative.

PREVIOUS DEMO NEXT DEMO

Mark Schmidt > Software > UGM > TrainApprox Demo