Hidden Markov Models
Contents
Hidden Markov Models#
Today we will take a look at Hidden Markov Models (HMMs). HMMs deal with Markov processes in which the states are unobservable or hidden but influence an observable process.
HMMs are used in various fields that include bioinformatics, finance, robotics, developmental studies, speech recognition, and Natural Language Processing (NLP).
A concrete example from a developmental study is modelling of infant-free-play regimes. While it is not possible to directly observe if an infant is focused or exploring, a prediction can be made based on the number of toys they were interacting during a certain period of time.
One of the applications of HMMs in computational finance is for modeling stock market states, such as a bull market (it occurs when the stock prices are rising and investors are optimistic) and a bear market (it happends when the prices decline). It is impossible to directly observe the state of the market but put/call ratios, which are indicators of investor sentiment and are associated with short-term stock market returns, can be used to predict them.
In NLP, HMMs are often used for Parts-of-Speech (POS) tagging, a process that assigns a grammatical category, such as noun, verb, and adjective, to each word in a piece of text. While this task seems relatively easy to humans who speak the language from which the text has been taken, it is much harder for a machine. For instance, consider the following sentences:
I should book my flight to Paris.
I am reading an interesting book.
The word book is a verb in the first sentence and is a noun in the second one. There are many ambigous words, like book, for which the POS-tagging is not a trivial task.
We will look into HMMs for POS tagging in more detail in the next lecture.
Problem definition#
For now we will focus on the following example.
Example: Umbrella problem. A student is studying for an exam in a room without any windows. Every day she wants to know whether it is rainy or sunny outside. However, her only access to the outside world is when she sees her housemate leaving the house with or without an umbrella each morning.
This problem is a little unrealistic but has the components we are interested in.
The student doesn’t have direct access to the outside world, but wants to know whether it is sunny or rainy. Therefore, the set
represents the hidden states.
Instead of observing the hidden states directly, the student observes a signal emitted by the hidden states - whether the housemate carries an umbrella or not. Thus,
is the set of possible observations.
The signal that the student observes is noisy. For example, even if it’s not raining, the housemate might be bringing an umbrella, because he forgot to check the weather report.
Model assumptions#
To be able to model the umbrella problem, we need to represent it as a discrete-time process. That is, we need to specify a time step between the events.
Furthermore, we need the following assumptions to create an HMM.
Markov property: the weather at day
depends only on the weather at day .Stationarity: the probability of transitioning from one hidden state to another is the same for every time step.
Output independence: the observation at day
depends only on the hidden state at day .
State-transition diagram#
Let us visualize the model and assign the probabilities.
The diagram shows that, for instance, the probability of the housemate brining an umbrella on a sunny day is 0.2.
In addition, we assume that the intial probabilities are 0.6 and 0.4 for a sunny day and a rainy day, respectively.
Decoding#
In general, there are three types of questions that can be asked about the HMMs.
Evaluation: What is the probability of an observed sequence?
Decoding: What is the most likely series of states to generate an observed sequence?
Learning: How can we learn the parameters of HMM given some data?
We will focus on decoding problems.
In particular, the question we are going to address in this lecture is the following:
Given the model depicted in the state-transition diagram and a sequence of observations
Transition and emission matrices#
The state-transition diagram provides an accesible manner to present the HMM for the umbrella problem. However, this is only the case for problems with a low number of hidden and observable states. Generally, matrix notation is used to describe the model.
Let us start with the transition matrix
The transition matrix for the umbrella problem is equal to
Note that
The initial probabilities are given by
Next, we construct matrix
B is also column stochastic and is known as the emission matrix.
For the umbrella example
Together
Brute-force approach#
Recall that we want to decode the following sequence of observations
Of course, it is possible to answer this question by directly computing the joint probability of
Let’s find an expression for the joint probability for a general sequence of observations of length 4,
This can be written as
All we did to obtain this expression was using conditional probability, the output-independence assumption, and the Markov-property assumption.
Similarly, for
For example, let us look at the joint probability of observation sequence
Imagine a problem with
The Viterbi algorithm#
Fortunately, there are algorithms that are considerably less computationally expensive than the brute-force approach. One of these algorithms is the Viterbi algorithm that has computational time complexity of
The Viterbi algorithm finds the solution to the decoding problem in a step-wise manner. Each step
The Viterbi algorithm is recursive, because the optimization is performed based on the HMM and the preceding step. The stored hidden states are used at the final stage of the Viterbi algorithm to find the optimal path. This path is known as the Viterbi path.
The Vitebri algorithm can be split into three main stages:
Initialization,
Forward pass,
Backward pass.
Before we proceed with the discussion of the stages of the Viterbi algorithm, we need to introduce its auxiliary matrices,
Initialization#
The initialization stage populates the first column of matrix
To find the first column of
Equivalently,
where
Since for the umbrella problem
The first column of
Therefore, matrices
where
Forward pass#
The forward pass completes both auxiliary matrices. The components of matrix
Equivalently, we can write
where
The components of matrix
Let us perform the forward pass for the umbrella problem.
Since the second observed value
The above expression can be written as
Substituting the values from matrices
Note that the value of
The remaining components of
Backward pass#
The backward pass constructs the path that provides the optimal sequence of hidden states for the observed sequence.
First, the backward pass finds the row that contains the highest probability in the last column of matrix
Once
In our case, it is the first row, which corresponds to hidden state
Moreover,
This value directs us to the second row of matrix
Since
we find that
The complete path through matrix
The corresponding sequence of hidden states is
Python implementation of the Viterbi algorithm#
def Viterbi(y, A, B, Pi):
N = A.shape[1] # cardinality of the state space
T = len(y) # length of the observed sequence
# Initialize C & D
C = np.empty((N, T), 'd') #'d' stands for type double
D = np.empty((N, T), 'B') #'B' stands for type unsigned integer
# Initialization stage
C[:, 0] = B[y[0], :] * Pi.T
D[:, 0] = 0
# Forward pass
for i in range(1, T):
C[:, i] = np.max(B[y[i], :, np.newaxis] * A * C[:, i - 1], 1)
D[:, i] = np.argmax(B[y[i], :, np.newaxis] * A * C[:, i - 1], 1)
D[:,1:] = D[:,1:] + 1 # hidden states indices start with 1
# Backward pass
x = np.empty(T, 'B')
x[-1] = np.argmax(C[:, T - 1]) + 1 # finds the value of s
for i in reversed(range(1, T)):
x[i - 1] = D[x[i] - 1, i]
return x, C, D
# HMM and observed sequence
A = np.array([[0.7,0.4],[0.3,0.6]])
B = np.array([[0.8, 0.4],[0.2, 0.6]])
Pi = np.array([[0.6],[0.4]])
print("Transition matrix: \n", A)
print("Emission matrix: \n", B)
print("Initial state distribution: \n", Pi)
O = np.array([0,1,1,0])
print("Observed sequence: \n", O)
Transition matrix:
[[0.7 0.4]
[0.3 0.6]]
Emission matrix:
[[0.8 0.4]
[0.2 0.6]]
Initial state distribution:
[[0.6]
[0.4]]
Observed sequence:
[0 1 1 0]
# Result
X, C, D = Viterbi(O,A,B,Pi)
print("Matrix C: \n", C)
print("Matrix D: \n", D)
print("Answer: \n", X)
Matrix C:
[[0.48 0.0672 0.009408 0.00995328]
[0.16 0.0864 0.031104 0.00746496]]
Matrix D:
[[0 1 1 2]
[0 1 2 2]]
Answer:
[1 2 2 1]
Summary#
HMMs deal with Markov processes in which the states are hidden but influence an observable process. HMMs satisfy the following assumptions:
Markov property,
stationarity,
output independence.
HMMs are fully described by the initial state, transition matrix, and emission matrix.
A decoding problem answers the question ‘’What is the most likely series of states to generate an observed sequence?’’
It can be solved using a computationally expensive brute-force approach or by more advanced algorithms, such as the Viterbi algorithm.The Vitebri algorithm is a recursive, step-wise procedure that can be split in 3 main parts:
initialization,
forward pass,
backward pass.