## HMMs with discrete outputs

### Maximum likelihood parameter estimation using EM (Baum Welch)

The script dhmm_em_demo.m gives an example of how to learn an HMM
with discrete outputs. Let there be Q=2 states and O=3 output
symbols. We create random stochastic matrices as follows.
O = 3;
Q = 2;
prior0 = normalise(rand(Q,1));
transmat0 = mk_stochastic(rand(Q,Q));
obsmat0 = mk_stochastic(rand(Q,O));

Now we sample nex=20 sequences of length T=10 each from this model, to
use as training data.
T=10;
nex=20;
data = dhmm_sample(prior0, transmat0, obsmat0, nex, T);

Here data is 20x10.
Now we make a random guess as to what the parameters are,
prior1 = normalise(rand(Q,1));
transmat1 = mk_stochastic(rand(Q,Q));
obsmat1 = mk_stochastic(rand(Q,O));

and improve our guess using 5 iterations of EM...
[LL, prior2, transmat2, obsmat2] = dhmm_em(data, prior1, transmat1, obsmat1, 'max_iter', 5);

LL(t) is the log-likelihood after iteration t, so we can plot the
learning curve.
### Sequence classification

To evaluate the log-likelihood of a trained model given test data,
proceed as follows:
loglik = dhmm_logprob(data, prior2, transmat2, obsmat2)

Note: the discrete alphabet is assumed to be {1, 2, ..., O},
where O = size(obsmat, 2). Hence data cannot contain any 0s.
To classify a sequence into one of k classes, train up k HMMs, one per
class, and then compute the log-likelihood that each model gives to
the test sequence; if the i'th model is the most likely, then declare
the class of the sequence to be class i.

### Computing the most probable sequence (Viterbi)

First you need to evaluate B(i,t) = P(y_t | Q_t=i) for all t,i:
B = multinomial_prob(data, obsmat);

Then you can use
[path] = viterbi_path(prior, transmat, B)

## HMMs with mixture of Gaussians outputs

### Maximum likelihood parameter estimation using EM (Baum Welch)

Let us generate nex=50 vector-valued sequences of length T=50; each
vector has size O=2.
O = 2;
T = 50;
nex = 50;
data = randn(O,T,nex);

Now let use fit a mixture of M=2 Gaussians for each of the Q=2 states
using K-means.
M = 2;
Q = 2;
left_right = 0;
prior0 = normalise(rand(Q,1));
transmat0 = mk_stochastic(rand(Q,Q));
[mu0, Sigma0] = mixgauss_init(Q*M, reshape(data, [O T*nex]), cov_type);
mu0 = reshape(mu0, [O Q M]);
Sigma0 = reshape(Sigma0, [O O Q M]);
mixmat0 = mk_stochastic(rand(Q,M));

Finally, let us improve these parameter estimates using EM.
[LL, prior1, transmat1, mu1, Sigma1, mixmat1] = ...
mhmm_em(data, prior0, transmat0, mu0, Sigma0, mixmat0, 'max_iter', 2);

Since EM only finds a local optimum, good initialisation is crucial.
**The initialisation procedure illustrated above is very crude, and is probably
not adequate for real applications...**
Click
This is a well-known pathology of maximum likelihood estimation for
Gaussian mixtures: the global optimum may place one mixture component
on a single data point, and give it 0 covariance, and hence infinite
likelihood. One usually relies on the fact that EM cannot find the
global optimum to avoid such pathologies.

Buried inside of KPMstats/mixgauss_Mstep you will see that cov_prior
is initialized to 0.01*I. This is added to the maximum likelihood
estimate after every M step.
To change this, you will need to modify the
mhmm_em function so it calls mixgauss_Mstep with a different value.
### Sequence classification

To classify a sequence (e.g., of speech) into one of k classes (e.g.,
the digits 0-9), proceed as in the DHMM case above,
but use the following procedure to compute likelihood:
loglik = mhmm_logprob(data, prior, transmat, mu, Sigma, mixmat);

### Computing the most probable sequence (Viterbi)

First you need to evaluate B(t,i) = P(y_t | Q_t=i) for all t,i:
B = mixgauss_prob(data(:,:,ex), mu, Sigma, mixmat);

where data(:,:,ex) is OxT where O is the size of the observation vector.
Finally, use
[path] = viterbi_path(prior, transmat, B);

## HMMs with Gaussian outputs

This is just like the mixture of Gaussians case,
except we have M=1, and hence there is no mixing matrix.
## Online EM for discrete HMMs/ POMDPs

For some applications (e.g., reinforcement learning/ adaptive
control), it is necessary to learn a model online.
The script dhmm_em_online_demo gives an example of how to do this.