Typically, using data to estimate good potential functions will lead to a much better model than if we tuned the potential functions manually. This is true for several reasons, but one of the main reasons is simply that in most models it is difficult to describe what the values of the potential functions mean in probabilistic terms (of course, there are some exceptions like Markov chains). However, parameter estimation will generally be harder than decoding/inference/sampling with known parameters. Indeed, most parameter estimation methods will need to use decoding/inference/sampling as a sub-routine. Further, we also must address the modeling issue of tieing/parameterizing the potential functions.

We can load this data set into a variable *y* using:

In UGM the data should is stored in theload rain.mat y = int32(X+1); % Convert from {0,1} double to {1,2} int32 representation

Below is a plot of the first 100 months:

In this plot, `rain' (state 2) is represented by the color red, while `no rain' (state 1) is represented by the color blue.
One of the simplest models we could fit to this data set would simply treat each day as an independent sample from a Bernoulli distribution. Here, we simply calculate the probability that it will rain as the average across all the days:

So, we have a non-zero daily precipitation on ~41% of the days. We can calculate the log-likelihood of the full data set under this simple model as follows:p_rain = sum(y(:)==2)/numel(X) p_rain = 0.4115

negloglik_y = -log(p_rain)*sum(y(:)==2) - log(1-p_rain)*sum(y(:)==1) negloglik_y = 2.0086e+004

These samples look less 'clumpy' than the real data. This makes sense, the independent model does not take into account that if it rained yesterday, it is more likely to rain today. To take into account this dependency, we will use a UGM to model the statistical dependency between adjacent days. However, unlike previous demos where we chose a set of node/edge potentials that we hoped would be a good model of real data, in this demo we will use the real data to find the node/edge potentials that maximize the likelihood of the data. That is, we will give UGM the data, tell it to use a chain-structured dependency (as well as some extra information about how to construct the potentials), and the output will be a set of parameters. We can then use these parameters to form our potentials and use our usual (conditional) decoding/inference/sampling with the potentials.

As always, our first step will be to build the edgeStruct that keeps track of information about the graphical structure. We will assume a chain-structured dependency:

[nInstances,nNodes] = size(y); nStates = max(y); adj = zeros(nNodes); for i = 1:nNodes-1 adj(i,i+1) = 1; end adj = adj+adj'; edgeStruct = UGM_makeEdgeStruct(adj,nStates);

- We set
**nodePot(n,s) = exp(w(nodeMap(n,s)))**. - We set
**edgePot(s1,s2,e)= exp(w(edgeMap(s1,s2,e))**.

In our example, we will define the *nodeMap* as follows:

By setting the first column of themaxState = max(nStates); nodeMap = zeros(nNodes,maxState,'int32'); nodeMap(:,1) = 1;

In our example, we will first define the *edgeMap* as follows:

That is, we will make all edges use the second element of the parameter vectornEdges = edgeStruct.nEdges; edgeMap = zeros(maxState,maxState,nEdges,'int32'); edgeMap(1,1,:) = 2; edgeMap(2,2,:) = 2;

nParams = max([nodeMap(:);edgeMap(:)]); w = zeros(nParams,1);

We can use the maps and the parameter vector to make the potentials as follows:

Since we initialized[nodePot,edgePot] = UGM_MRF_makePotentials(w,nodeMap,edgeMap,edgeStruct);

suffStat = UGM_MRF_computeSuffStat(y,nodeMap,edgeMap,edgeStruct);

nll = UGM_MRF_NLL(w,nInstances,suffStat,nodeMap,edgeMap,edgeStruct,@UGM_Infer_Chain) nll = 2.0553e+04

w = minFunc(@UGM_MRF_NLL,w,[],nInstances,suffStat,nodeMap,edgeMap,edgeStruct,@UGM_Infer_Chain) Iteration FunEvals Step Length Function Val Opt Cond 1 2 1.16857e-04 1.79845e+04 1.63026e+03 2 3 1.00000e+00 1.78218e+04 3.67640e+02 3 4 1.00000e+00 1.78108e+04 2.88642e+01 4 5 1.00000e+00 1.78108e+04 4.99058e+00 5 6 1.00000e+00 1.78108e+04 4.94596e-02 6 7 1.00000e+00 1.78108e+04 1.37016e-03 Directional Derivative below progTol w = 0.1590 0.8504

Looking at the *Function Val* column in the output of *minFunc*, we see that the negative log-likelihood with the estimated parameters is much lower than the initial parameters, and also much lower than with the independent model.

If we want to, we can make and look at the potentials based on the estimated parameters:

Looking at the potentials, we see that the node potential is higher for the state 'no rain', and the edge potentials are higher for adjacent days having the same value. Although this is exactly what we expect, by optimizing the parameters we in some sense have found the 'best' parameters.[nodePot,edgePot] = UGM_MRF_makePotentials(w,nodeMap,edgeMap,edgeStruct); nodePot(1,:) edgePot(:,:,1) ans = 1.1723 1.0000 ans = 2.3406 1.0000 1.0000 2.3406

% Use different parameters for different combinations of states edgeMap(2,2,:) = 0; edgeMap(1,2,:) = 3; edgeMap(2,1,:) = 4; % Initialize weights nParams = max([nodeMap(:);edgeMap(:)]); w = zeros(nParams,1); % Compute sufficient statistics suffStat = UGM_MRF_computeSuffStat(y,nodeMap,edgeMap,edgeStruct); % Optimize w = minFunc(@UGM_MRF_NLL,w,[],nInstances,suffStat,nodeMap,edgeMap,edgeStruct,@UGM_Infer_Chain) Iteration FunEvals Step Length Function Val Opt Cond 1 2 7.12949e-05 1.89590e+04 3.14727e+03 2 3 1.00000e+00 1.85381e+04 1.26201e+03 3 4 1.00000e+00 1.81469e+04 1.28848e+03 4 5 1.00000e+00 1.79040e+04 1.23205e+03 5 6 1.00000e+00 1.78407e+04 2.22810e+02 6 7 1.00000e+00 1.78393e+04 4.11467e+01 7 9 1.00000e+01 1.78379e+04 1.01343e+02 8 10 1.00000e+00 1.78333e+04 1.61229e+02 9 11 1.00000e+00 1.78192e+04 2.38441e+02 10 12 1.00000e+00 1.78089e+04 1.72145e+02 11 13 1.00000e+00 1.78018e+04 5.25205e+01 12 14 1.00000e+00 1.77996e+04 3.54473e+01 13 15 1.00000e+00 1.77988e+04 5.37657e+01 14 16 1.00000e+00 1.77980e+04 4.41137e+01 15 17 1.00000e+00 1.77977e+04 7.91935e+00 16 18 1.00000e+00 1.77977e+04 4.92522e-01 17 19 1.00000e+00 1.77977e+04 2.56970e-02 18 20 1.00000e+00 1.77977e+04 4.31682e-03 19 21 1.00000e+00 1.77977e+04 1.71722e-04 Directional Derivative below progTol w = 0.5968 -0.4499 -1.0029 -1.1506

[nodePot,edgePot] = UGM_MRF_makePotentials(w,nodeMap,edgeMap,edgeStruct); nodePot(1,:) edgePot(:,:,1) ans = 1.8163 1.0000 ans = 0.6377 0.3668 0.3165 1.0000

nodeMap(1,1) = 5; nodeMap(end,1) = 5; % Initialize weights nParams = max([nodeMap(:);edgeMap(:)]); w = zeros(nParams,1); % Compute sufficient statistics suffStat = UGM_MRF_computeSuffStat(y,nodeMap,edgeMap,edgeStruct); % Optimize w = minFunc(@UGM_MRF_NLL,w,[],nInstances,suffStat,nodeMap,edgeMap,edgeStruct,@UGM_Infer_Chain) Iteration FunEvals Step Length Function Val Opt Cond 1 2 7.12949e-05 1.89092e+04 2.96046e+03 2 3 1.00000e+00 1.85021e+04 1.36128e+03 3 4 1.00000e+00 1.80446e+04 1.60027e+03 4 5 1.00000e+00 1.78524e+04 1.07821e+03 5 6 1.00000e+00 1.78029e+04 2.56363e+02 6 7 1.00000e+00 1.78007e+04 4.49694e+01 7 8 1.00000e+00 1.78006e+04 4.37522e+01 8 9 1.00000e+00 1.77998e+04 7.82306e+01 9 10 1.00000e+00 1.77989e+04 1.03849e+02 10 11 1.00000e+00 1.77981e+04 6.17314e+01 11 12 1.00000e+00 1.77979e+04 1.36103e+01 12 13 1.00000e+00 1.77979e+04 6.93718e+00 13 14 1.00000e+00 1.77979e+04 1.28554e+01 14 15 1.00000e+00 1.77979e+04 2.21724e+01 15 16 1.00000e+00 1.77978e+04 2.85041e+01 16 17 1.00000e+00 1.77978e+04 2.28412e+01 17 18 1.00000e+00 1.77977e+04 7.15382e+00 18 19 1.00000e+00 1.77977e+04 8.17118e-01 19 20 1.00000e+00 1.77977e+04 1.84132e-02 20 21 1.00000e+00 1.77977e+04 6.14687e-03 21 22 1.00000e+00 1.77977e+04 2.12248e-05 Directional Derivative below progTol w = -0.2839 0.4308 -0.5626 -0.7103 0.1565

The likelihood was not noticeably affected by modeling the boundary condition, though the parameters we obtained were significantly different. For example, the potential of the (no rain,no rain) state is now higher than the (rain,rain) state. This illustrates the difficulty in interpreting the potentials, as opposed to looking at marginals obtained via inference or looking at samples from the distribution.[nodePot,edgePot] = UGM_MRF_makePotentials(w,nodeMap,edgeMap,edgeStruct); nodePot(1,:) nodePot(2,:) edgePot(:,:,1) ans = 1.1694 1.0000 ans = 0.7529 1.0000 ans = 1.5384 0.5697 0.4915 1.0000

Below, we show a plot of 100 samples from the model:samples = UGM_Sample_Chain(nodePot,edgePot,edgeStruct);

These samples look much more like the real data than the samples from our independent model. Like in the real data, we see long streaks of rainy days, as well as long streaks of days with no rain. Such long streaks are very unlikely under an independent model where the average probability of rain on a random day is 41.15%.

We can just as easily do decoding or inference with the potential functions, as we did in previous demos. Conditional decoding/inference/sampling is also straightforward, and we can use it to try and make predictions given observations. For example, if wee see that it rains on the first days of a month, we can use the model to predict what the rest of the month might look like. Below, we condition on the first two days taking the rain state, and generate 100 conditional samples of the rest of the month:

clamped = zeros(nNodes,1); clamped(1:2) = 2; condSamples = UGM_Sample_Conditional(nodePot,edgePot,edgeStruct,clamped,@UGM_Sample_Chain);

In the special case of triangulated graphs, also known as *decomposable* graphical models), the maximum likelihood estimates can be obtained without the need for a gradient method. However, this is not implemented in UGM. Classically, the iterative proportional fitting (IPF) algorithm was used to fit the parameters of MRFs. This method is a form of coordinate descent, but on many problems it is much slower than using L-BFGS (especially for over-parameterized models) so I did not include an implementation of it.

In this and the subsequent demos on parameter estimation, we assume that the graph structure is fixed across training examples. This is why we used the (somewhat ackward) truncation of the rain data to the first 28 days of the month. To use UGM for data sets where different training examples have different graph structures (or number of nodes, number of states, etc.), you simply need to optimizie the sum of the negative log-likelihoods across the examples with different structures.

PREVIOUS DEMO NEXT DEMO

Mark Schmidt > Software > UGM > TrainMRF Demo