Hidden Markov Models (part 2)

Hidden Markov Models (part 2)#

Example: Baby-noise problem#

Let us look at another example.

Baby monitors allow new parents to combine caretaking of their baby with other activities, while the baby is napping. Many parents evaluate whether the baby is awake or asleep based on how much noise she makes over a certain period of time, say 2 minutes. For this exercise we assume three possible levels of noise:

  • 0 - quiet;

  • 1 - little noise;

  • 2 - much noise.

The table below gives the probabilities of different noise levels for when the baby is awake and when she is asleep.

state\noise level

0

1

2

Awake

0.5

0.3

0.2

Asleep

0.85

0.1

0.05

Moreover, we assume that if the baby was asleep in the last 2 minutes, she will remain asleep with probability 80%. Whereas if she was awake in the last 2 minutes, she will remain awake with probability 60%. Finally, we assume that when the parents start monitoring the baby, the probability of her being asleep is 70%. Find the states of the baby in the last 6 minutes if first there was much noise, then little noise, and then it was quiet.

The corresponding state-transition diagram is given below.

_images/HMM_baby_monitor.svg

This model can be described by the following matrices:

\[\begin{split}\pi = \begin{bmatrix} 0.3 \\ 0.7 \end{bmatrix}, A = \begin{bmatrix} 0.6 & 0.2\\ 0.4 & 0.8 \end{bmatrix}, \text{ and } B = \begin{bmatrix} 0.5 & 0.85 \\ 0.3 & 0.1 \\ 0.2 & 0.05 \end{bmatrix}.\end{split}\]

The observed sequence is \(O = (2,1,0).\) We are asked to find the corresponding hidden states \(X = (x_0, x_1, x_2).\)

Initialization#

Recall that the joint probability of \(O\) and \(X\) of length 3 is given by

\[P(O,X) = \prod_{n=0}^{2} P(o_n | x_n) \prod_{n=1}^{2} P(x_n | x_{n-1})P(x_0). \]

It can be rewritten as

\[P(O,X) = P(o_2 | x_2) P(o_1 | x_1) P(o_0 | x_0) P(x_2 | x_1) P(x_1 | x_0) P(x_0).\]

In the initialization step of the Viterbi algorithm we compute the first column of \(C\), which contains

\[C_{i1} = P(o_0 | x_0 = q_i) P(x_0 = q_i).\]

Note that \(P(x_0 = q_i)\) is the last term and \(P(o_0 | x_0 = q_i)\) is the third term from the left in the joint probability formula.

For the hidden state Awake (\(Aw\)) we find

\[C_{11} = P(2|Aw)P(Aw) = B_{31}\pi_1 = 0.2\cdot0.3 = 0.06.\]

For the hidden state Asleep (\(As\)) we obtain

\[C_{21} = P(2|As)P(As) = B_{32}\pi_2 = 0.05\cdot0.7 = 0.035.\]
_images/Viterbi_diagram_initialization_baby_monitor.svg

Therefore, the first columns of \(C\) and \(D\) become

\[\begin{split}C = \begin{bmatrix}0.06 & \ast & \ast \\ 0.035 & \ast & \ast \end{bmatrix} \text{ and } D = \begin{bmatrix}0 & \ast & \ast \\ 0 & \ast & \ast \end{bmatrix}.\end{split}\]

Forward pass#

The forward pass completes both auxiliary matrices. It populates the matrices in a step-wise manner.
First, we compute the probabilities that are required for the second observation in the sequence \(O\). Namely, \(o_1 = 1\), little noise.

\[P(1|Aw)P(Aw|Aw) = B_{21}A_{11} = 0.3\cdot0.6 = 0.18.\]
\[P(1|As)P(As|Aw) = B_{22}A_{21} = 0.1\cdot0.4 = 0.04.\]
\[P(1|As)P(As|As) = B_{22}A_{22} = 0.1\cdot0.8 = 0.08.\]
\[P(1|Aw)P(Aw|As) = B_{21}A_{12} = 0.3\cdot0.2 = 0.06.\]
_images/Viterbi_diagram_forward1a_baby_monitor.svg

Then, we can find the entries in the second column of \(C\) in a recursive manner (i.e., using the entries of its first column).

\[C_{12} = \max\left(P(1|Aw)P(Aw|Aw)C_{11}, P(1|Aw)P(Aw|As)C_{21} \right)\]
\[ = \max(0.18\cdot 0.06, 0.06\cdot 0.035) = \max(0.0108, 0.0021) = 0.0108.\]

Similarly, we obtain \(C_{22}\):

\[C_{22} = \max(0.04\cdot 0.06, 0.08\cdot 0.035) = \max(0.0024, 0.0028) = 0.0028.\]

Note that \(C_{i2}\) corresponds to the part of \(P(O,X)\) that is shown in bold

\[P(O,X) = P(o_2 | x_2) \mathbf{P(o_1 | x_1) P(o_0 | x_0)} P(x_2 | x_1) \mathbf{P(x_1 | x_0) P(x_0)}.\]

Since the hidden state at time step 0 that maximizes \(C_{12}\) is \(q_1 = Aw\), \(D_{12} = 1.\)
Since the hidden state that leads to the maximum value of \(C_{22}\) is \(q_2 = As\), \(D_{22} = 2.\)

_images/Viterbi_diagram_forward1b_baby_monitor.svg

We repeat the calculations for the final observation in the sequence \(O\), \(o_2 = 0\).

\[P(0|Aw)P(Aw|Aw) = B_{11}A_{11} = 0.5\cdot0.6 = 0.3.\]
\[P(0|As)P(As|Aw) = B_{12}A_{21} = 0.85\cdot0.4 = 0.34.\]
\[P(0|As)P(As|As) = B_{12}A_{22} = 0.85\cdot0.8 = 0.68.\]
\[P(0|Aw)P(Aw|As) = B_{11}A_{12} = 0.5\cdot0.2 = 0.1.\]

Next, we compute the last column of matrix \(C\).

\[C_{13} = \max(0.3\cdot0.0108,0.1\cdot 0.0028) = 0.00324.\]
\[C_{23} = \max(0.34\cdot0.0108,0.68\cdot0.0028) = 0.003672.\]

This time \(C_{i3}\) represents the full expression for the joint probability, \(P(O,X)\).

The maximum values of \(C_{13}\) and \(C_{23}\) are both found using \(x_1 = Aw\). Thus,

\[D_{13} = D_{23} = 1.\]
_images/Viterbi_diagram_forward2_baby_monitor.svg

From here follows that

\[\begin{split}C = \begin{bmatrix} 0.06 & 0.0108 & 0.00324 \\ 0.035 & 0.0028 & 0.003672 \end{bmatrix} \text{ and } D = \begin{bmatrix} 0 & 1 & 1 \\ 0 & 2 & 1 \end{bmatrix}.\end{split}\]

This completes the forward pass.

Backward pass#

It remains to perform the backward pass. That is, we need to find the Viterbi path.

From the last column of \(C\) we know that \(s = 2\), because the probability stored in the second row is higher than the probability stored in the first row of that column.

This implies that the reverse search for the optimal path starts with entry (2,3) of matrix \(D\). This entry is located in the second row of \(D\) that corresponds with the state Asleep. Thus, we conclude that \(x_2\) is equal to \(As\). In addition, \(D_{23} = 1\). The value 1 leads us to the first row in the second column, entry \(D_{12}\). \(D_{12}\) tells us that \(x_1=Aw\) and leads us to entry \(D_{11}\). Hence, \(x_0 = Aw\). The complete path through matrix \(D\) is shown below.

\[\begin{split}D = \begin{bmatrix} \mathbf{0} & \mathbf{1} & 1 \\ 0 & 2 & \mathbf{1} \end{bmatrix}.\end{split}\]

The answer to our decoding problem is \(X = (Aw, Aw, As).\)

Note that this path can also be found by looking at the largest entries of each column of \(C\).