{ "cells": [ { "cell_type": "code", "execution_count": 3, "id": "d4c0cc67", "metadata": { "hide_input": true, "slideshow": { "slide_type": "skip" }, "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "%matplotlib inline\n", "%config InlineBackend.figure_format='retina'\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from IPython.display import Image, display_html, display, Math, HTML" ] }, { "cell_type": "markdown", "id": "2f8d3b5e", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Markov Chains" ] }, { "cell_type": "markdown", "id": "c19bf3f2", "metadata": { "hide_input": true, "slideshow": { "slide_type": "slide" } }, "source": [ "Today we will take a temporary pause from Bayesian statistics to discuss a fundamental tool in probability." ] }, { "cell_type": "markdown", "id": "a84d1402", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Consider the following model for the weather in Boston: \n", "\n", ">If today is sunny, tomorrow will be sunny with probability 0.8; otherwise it will be rainy. If today is rainy, tomorrow will be rainy with probabililty 0.6; otherwise it will be sunny." ] }, { "cell_type": "markdown", "id": "22524230", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can draw this scenario with a diagram like this:\n", "\n", "
\n", "\n", "\"Figure\"\n", " \n", "
\n" ] }, { "cell_type": "markdown", "id": "c37bcd4c", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Why is this useful? It helps us reason about what might happen many days in the future." ] }, { "cell_type": "markdown", "id": "4e28e0b7", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For example, we can ask \"what is the probability that if today is sunny, tomorrow is sunny and the day after this is rainy\" by imaging taking a walk along this graph following the arrows, and multiplying the probability each time." ] }, { "cell_type": "markdown", "id": "6a76a7a5", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The great thing is we can do this at the same time for all possible starting conditions for an aribtrary number of days into the future! We'll do this by combining our knowledge of linear algebra and probability." ] }, { "cell_type": "markdown", "id": "e0b90ebd", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "This tool is called the __Markov Chain__ after its inventor Andrei Markov, a Russian mathematician working in the late 19th / early 20th century." ] }, { "cell_type": "markdown", "id": "487310c8", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "```{margin}\n", "AAMarkov by unknown; \n", "originally uploaded to en-wikipedia by \n", "en:User:Mhym \n", " - From the official web site of the Russian Academy of Sciences: http://www.ras.ru/win/db/show_per.asp?P=.id-53175.ln-en.dl-.pr-inf.uk-0. \n", " Licensed under Public Domain via Wikimedia Commons.\n", "```" ] }, { "cell_type": "markdown", "id": "dd3d966f", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "
\n", "\n", "\"Figure\"\n", " \n", "
\n", "\n", "
\n", "\n", "
\n", "Andrei Markov, 1856 - 1922, St Petersburg.\n", "
" ] }, { "cell_type": "markdown", "id": "4ec57d25", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Amazingly, we will eventually be able to use Markov chains to help us with the grid search problem we encountered in Bayesian inference, but it will take some background to get there." ] }, { "cell_type": "markdown", "id": "67fe7bbf", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "```{margin}\n", "Good references for this lecture and next two, with a more formal treatment, are Chapters 2 and 3 of [these lecture notes](https://personal.math.ubc.ca/~holmescerfon/teaching.html#asanotes). These notes draw on those references in places.\n", "```" ] }, { "cell_type": "markdown", "id": "16d5f4f7", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "A Markov chain is a model of a system that evolves in time. For our purposes we will assume time is discrete, meaning that the system evolves in steps that can be denoted with integers." ] }, { "cell_type": "markdown", "id": "bf80532e", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "At any given time, the Markov chain is in a particular _state_. The concept of a state is very flexible. We can use the idea of state to represent a lot of things: maybe a location in space, or a color, or a number of people in a room, or a particular value of a parameter. Anything that can be modeled as a discrete thing can be a state." ] }, { "cell_type": "markdown", "id": "13f38668", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The basic idea of a Markov chain is that it moves from state to state __probabilistically.__ In any state, there are certain probabilities of moving to each of the other states. " ] }, { "cell_type": "markdown", "id": "e28583e3", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Definitions" ] }, { "cell_type": "markdown", "id": "e858a95c", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We will denote the state that the Markov chain is in at time $n$ as $X_n$. Since the Markov chain evolves probabilistically, $X_n$ is a random variable." ] }, { "cell_type": "markdown", "id": "58e5ae3f", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "A defining aspect of a Markov chain is that each time it moves from the current state to the new state, the probability of that transition __only depends on the current state.__" ] }, { "cell_type": "markdown", "id": "bf04b2d6", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This is called the __Markov property.__" ] }, { "cell_type": "markdown", "id": "15ada1e7", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The point is that whatever states the chain was in previously do not matter for determining where it goes next. Only the current state affects where the chain goes next." ] }, { "cell_type": "markdown", "id": "ea57937e", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Here is an abstract view of the situation at a particular node 0.\n", "\n", "
\n", "\n", "\"Figure\"\n", " \n", "
\n", "\n" ] }, { "cell_type": "markdown", "id": "2ac6662e", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "From state 0, on the next step the chain can go to state 1, state 2, or even back to state 0. There is a probability associated with each transition." ] }, { "cell_type": "markdown", "id": "0e22f60f", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Note that the probabilities must sum to 1 -- on each transition, the chain must go somewhere (even if it remains at the current node)." ] }, { "cell_type": "markdown", "id": "fc525123", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "__Definition.__ Given a finite set of states $S = \\{1, 2, \\dots \\}$, a process $X_0, X_1, \\dots$ is a Markov chain on $S$ if, when the process is in state $j$, the probability that its next state is $i$ depends only on $j$." ] }, { "cell_type": "markdown", "id": "a64b86b7", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Formally, we say that \n", "\n", "$$P(X_{n+1} = i \\,\\vert\\, X_{n} = j, X_{n-1} = x_{n-1}, \\dots, X_0 = x_0) = P(X_{n+1} = i \\,\\vert\\, X_{n} = j).$$\n" ] }, { "cell_type": "markdown", "id": "6758e4ce", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This the formal statement of the __Markov property.__" ] }, { "cell_type": "markdown", "id": "3d415df5", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "It is usually convenient to work with Markov chains using linear algebra. For that, we use this notation:\n", "\n", "__Definition.__ The _transition matrix_ of the chain is the matrix $P$ with $P_{ij} = P(X_{n+1} = i \\,\\vert\\, X_{n} = j).$" ] }, { "cell_type": "markdown", "id": "5c7ce9a3", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Note that by the law of total probability, for any $j$:\n", " \n", "$$ \\sum_i P(X_{n+1} = i \\,\\vert\\, X_{n} = j) = \\sum_i P_{ij} = 1.$$\n", "\n", "Hence, the columns of $P$ each sum to 1. Such a matrix is called a __stochastic__ matrix." ] }, { "cell_type": "markdown", "id": "6ecd19c6", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "__Example 1.__ Let's go back to our weather in Boston example\n", "\n", "
\n", "\n", "\"Figure\"\n", " \n", "
\n" ] }, { "cell_type": "markdown", "id": "fa40e46b", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can see that if at time $t=0$ the chain is in the \"sunny\" state, at time $t=1$ it will either be in a \"sunny\" state (with probability 0.8) or \"rainy\" state (with probability 0.6)." ] }, { "cell_type": "markdown", "id": "18a852fa", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This chain has transition matrix:\n", "\n", "$$ P = \n", "\\begin{bmatrix}\n", ".8 & .4 \\\\\n", ".2 & .6 \\\\\n", "\\end{bmatrix}\n", "$$\n" ] }, { "cell_type": "markdown", "id": "704cddb5", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can confirm that this is a stochastic matrix by observing that the columns sum to 1." ] }, { "cell_type": "markdown", "id": "bef26943", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "__Example 2.__ Looking at a bigger, more abstract example, consider the Markov chain below, with states $\\{0, 1, 2, 3\\}$ and transition probabilities marked.\n", "\n", "
\n", "\n", "\"Figure\"\n", " \n", "
" ] }, { "cell_type": "markdown", "id": "e3ae91f4", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can see that if at time $t= 0$ the chain is in state 1, at time $t = 1$ it will either be in state 2 (with probability 0.6) or state 3 (with probability 0.4)." ] }, { "cell_type": "markdown", "id": "f5c00a8c", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This chain has transition matrix:\n", "\n", "$$ P = \n", "\\begin{bmatrix}\n", "0 & 0 & 0 & 0 \\\\\n", "0.7 & 0 & 1 & 0 \\\\\n", "0.3 & 0.6 & 0 & 0 \\\\\n", "0 & 0.4 & 0 & 1 \n", "\\end{bmatrix}\n", "$$\n" ] }, { "cell_type": "markdown", "id": "1aac6cc1", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Again, we can confirm that this is a stochastic matrix by summing the columns." ] }, { "cell_type": "markdown", "id": "ec2f8aeb", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Future States" ] }, { "cell_type": "markdown", "id": "65a8fca2", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "OK, so we know that if the chain is in state $j$, the probability that it will be in state $i$ at the next step is $P_{ij}$." ] }, { "cell_type": "markdown", "id": "584c5702", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "What is the probability it will be in state $i$ in __two__ steps? three steps? $n$ steps?" ] }, { "cell_type": "markdown", "id": "418c5d10", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let's __define__ the $n$-step transition probability matrix $P^{(n)}$ as\n", "\n", "$$ P^{(n)}_{ij} = P(X_n = i\\,\\vert\\,X_0 = j). $$\n" ] }, { "cell_type": "markdown", "id": "da8c4463", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "How can we compute $P^{(n)}$?" ] }, { "cell_type": "markdown", "id": "d46f6792", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let's compute it for the simple case of $n = 2$:\n", "\n", "$$ P^{(2)}_{ij} = P(X_2 = i\\,\\vert\\,X_0 = j) $$\n", "\n", "$$ = \\sum_k P(X_2 = i\\,\\vert\\,X_1 = k, X_0 = j)\\,P(X_1 = k\\,\\vert\\,X_0 = j)$$\n", "\n", "(This is by the law of total probability.) " ] }, { "cell_type": "markdown", "id": "4b0da992", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "which is then:\n", "\n", "$$ = \\sum_k P(X_2 = i\\,\\vert\\,X_1 = k)\\,P(X_1 = k\\,\\vert\\,X_0 = j).$$\n", "\n", "(By the Markov property)" ] }, { "cell_type": "markdown", "id": "881aa654", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Now let's write the expression above in terms of matrices:\n", "\n", "$$P^{(2)}_{ij} = \\sum_k P_{ik}P_{kj}$$\n", "\n", "Which turns out to be the definition of matrix multiplication." ] }, { "cell_type": "markdown", "id": "f073bf01", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "So $$P(X_2 = i\\,\\vert\\,X_0 = j) = P^{(2)}_{ij} = P^2_{ij} $$" ] }, { "cell_type": "markdown", "id": "bb25e630", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In other words, the two-step transition probabilities are given by the square of the $P$ matrix. " ] }, { "cell_type": "markdown", "id": "420dc68d", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "You can see that this generalizes: the $n$-step transition probabilities are therefore given by the $n$th power of $P$.\n", " \n", "$$P^{(n)} = P^n. $$" ] }, { "cell_type": "markdown", "id": "d5300e0a", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "__Example 1.__ Looking at weather in Boston again:\n", "\n", "
\n", "\n", "\"Figure\"\n", " \n", "
\n" ] }, { "cell_type": "markdown", "id": "94992c86", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If at time $t=0$ the chain is in a \"sunny\" state, at time $t=2$ it will either be in \"sunny\" state (with probability 0.72) or a \"rainy\" state (with probability 0.28)." ] }, { "cell_type": "markdown", "id": "f38f02cb", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can confirm that by computing $P^2$:\n", "\n", "$$ P^2 = \n", "\\begin{bmatrix}\n", "0.8 & 0.4 \\\\\n", "0.2 & 0.6 \\\\\n", "\\end{bmatrix} * \\begin{bmatrix}\n", "0.8 & 0.4 \\\\\n", "0.2 & 0.6 \\\\\n", "\\end{bmatrix} = \n", "\\begin{bmatrix}\n", "0.72 & .56 \\\\\n", "0.28 & 0.44 \\\\\n", "\\end{bmatrix}\n", "$$\n" ] }, { "cell_type": "markdown", "id": "ebac57c5", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "__Example 2.__ And looking at our larger more abstract model again:\n", "\n", "
\n", "\n", "\"Figure\"\n", " \n", "
\n", "\n", "If at time $t= 0$ the chain is in state 1, at time $t = 2$ it will either be in state 1 (with probability 0.6) or state 3 (with probability 0.4)." ] }, { "cell_type": "markdown", "id": "17ea8c74", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can confirm this by computing $P^2$:\n", "\n", "$$ P^2 = \n", "\\begin{bmatrix}\n", "0 & 0 & 0 & 0 \\\\\n", "0.3 & 0.6 & 0 & 0 \\\\\n", "0.42 & 0 & 0.6 & 0 \\\\\n", "0.28 & 0.4 & 0.4 & 1 \n", "\\end{bmatrix}\n", "$$\n" ] }, { "cell_type": "markdown", "id": "50679676", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Distributions Over States" ] }, { "cell_type": "markdown", "id": "1d5ba88f", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Now suppose we don't know with certainty the initial state of the Markov chain, but rather we know a probabiity distribution over the initial states." ] }, { "cell_type": "markdown", "id": "cff410be", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We will organize this distribution into a column vector $\\mathbf{x}$. Since it is a distribution, the entries in $\\mathbf{x}$ are nonnegative and sum to 1." ] }, { "cell_type": "markdown", "id": "2f672d67", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This distribution will evolve in time, so we will denote the distribution at time $n$ as $\\mathbf{x}_n$." ] }, { "cell_type": "markdown", "id": "104c9f50", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In other words, $X_0$ is a random variable with distribution $P(X_0 = i) = \\mathbf{x}_{0,i}$." ] }, { "cell_type": "markdown", "id": "aaaece02", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Now, we can show that taking one step forward in time corresponds to multiplying $P$ times $\\mathbf{x}$ as follows:\n", "\n", "$$ \\mathbf{x}_{n+1,i} = P(X_{n+1} = i) = \\sum_j P(X_{n+1} = i\\,\\vert\\,X_n = j)\\,P(X_n = j) $$\n", "\n", "(by the law of total probability), " ] }, { "cell_type": "markdown", "id": "d4f2ab9f", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "so\n", "\n", "$$ \\mathbf{x}_{n+1,i} = \\sum_j P_{ij} \\mathbf{x}_{n,j} $$" ] }, { "cell_type": "markdown", "id": "f83abefa", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "so\n", "\n", "$$ \\mathbf{x}_{n+1} = P\\mathbf{x}_n. $$\n", "\n", "This is called the __Forward Kolmogorov Equation.__ This equation shows how to evolve the probability in time. " ] }, { "cell_type": "markdown", "id": "7de3fd18", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "It is clear then that\n", "\n", "$$ \\mathbf{x}_n = P^n \\mathbf{x}_0. $$\n", "\n", "That is, if we know the initial probability distribution at time 0, we can find the distribution at any later time using powers of the matrix $P$." ] }, { "cell_type": "markdown", "id": "8c5cf73d", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "__Example 1.__\n", "\n", "If we say that an aribtrary day in Boston has an 80% chance of being sunny, we can calculate whether it will be sunny two days later.\n", "\n", "We consider an initial probability distribution over states:\n", " \n", "$$ \\mathbf{x}_0 = \\begin{bmatrix}0.8\\\\0.2\\end{bmatrix} $$\n" ] }, { "cell_type": "markdown", "id": "f8555675", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Then to compute the distribution two days later, ie $\\mathbf{x}_2,$ we can use $P^2$:\n", "\n", "\n", "$$ \\mathbf{x}_2 = P^2\\mathbf{x}_0 = \n", "\\begin{bmatrix}\n", "0.72 & .56 \\\\\n", "0.28 & 0.44 \\\\\n", "\\end{bmatrix} \\, \\begin{bmatrix}0.8\\\\0.2\\end{bmatrix} = \\begin{bmatrix}0.688\\\\0.312\\end{bmatrix}\n", "$$" ] }, { "cell_type": "markdown", "id": "d2cf3bd6", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "__Example 2.__\n", "\n", "Continuing our larger more abstract model again, if we consider an initial probability distribution over states:\n", " \n", "$$ \\mathbf{x}_0 = \\begin{bmatrix}0.1\\\\0.3\\\\0.2\\\\0.4\\end{bmatrix} $$\n" ] }, { "cell_type": "markdown", "id": "6dad9582", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "\n", "Then to compute the distribution two steps later, ie $\\mathbf{x}_2,$ we can use $P^2$:\n", "\n", "$$ \\mathbf{x}_2 = P^2\\mathbf{x}_0 = \n", "\\begin{bmatrix}\n", "0 & 0 & 0 & 0 \\\\\n", "0.3 & 0.6 & 0 & 0 \\\\\n", "0.42 & 0 & 0.6 & 0 \\\\\n", "0.28 & 0.4 & 0.4 & 1 \n", "\\end{bmatrix} \\, \\begin{bmatrix}0.1\\\\0.3\\\\0.2\\\\0.4\\end{bmatrix} = \\begin{bmatrix}0\\\\0.21\\\\0.162\\\\0.628\\end{bmatrix}\n", "$$\n" ] }, { "cell_type": "markdown", "id": "852dff12", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Long Term Behavior\n", "\n", "One of the main questions we'll be asking is concerned with the __long run__ behavior of a Markov Chain." ] }, { "cell_type": "markdown", "id": "d62a1ca4", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let's consider the long run behavior of Boston weather. We start with just the knowledge that if today is sunny, tomorrow will be sunny with probability 0.8; otherwise it will be rainy. If today is rainy, tomorrow will be rainy with probabililty 0.6; otherwise it will be sunny. \n", "\n", "We know we can calculate the probabilty of a sunny day in 2 days or in 10 days, but can we say anything interesting about the weather in 100 or 1,000 days?" ] }, { "cell_type": "markdown", "id": "3cc8e54a", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Let's start by writing out the one-step transition matrix for this problem:\n", "\n", "$$ P = \\begin{bmatrix}0.8 & 0.4\\\\0.2 & 0.6\\end{bmatrix}$$" ] }, { "cell_type": "markdown", "id": "21e863ba", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Now, let's assume that day 0 is sunny, and let the chain \"run\" for a while:" ] }, { "cell_type": "code", "execution_count": 6, "id": "5a31addd", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "On day 0, state is [1 0]\n", "On day 1, state is [0.8 0.2]\n", "On day 2, state is [0.72 0.28]\n", "On day 3, state is [0.688 0.312]\n", "On day 4, state is [0.6752 0.3248]\n", "On day 5, state is [0.67008 0.32992]\n", "On day 6, state is [0.668032 0.331968]\n", "On day 7, state is [0.6672128 0.3327872]\n", "On day 8, state is [0.66688512 0.33311488]\n", "On day 9, state is [0.66675405 0.33324595]\n" ] } ], "source": [ "# day 0 is sunny\n", "x = np.array([1, 0])\n", "\n", "# P is one-step transition matrix\n", "P = np.array([[0.8, 0.4], [0.2, 0.6]])\n", "\n", "for i in range(10):\n", " print (f'On day {i}, state is {x}')\n", " x = P @ x" ] }, { "cell_type": "markdown", "id": "a82f2672", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This seems to be converging to a particular vector: $\\begin{bmatrix} 2/3 \\\\ 1/3 \\end{bmatrix}$." ] }, { "cell_type": "markdown", "id": "453b5307", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "What if instead, the chain starts from a rainy day?" ] }, { "cell_type": "code", "execution_count": 9, "id": "ab930983", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "On day 0, state is [0 1]\n", "On day 1, state is [0.4 0.6]\n", "On day 2, state is [0.56 0.44]\n", "On day 3, state is [0.624 0.376]\n", "On day 4, state is [0.6496 0.3504]\n", "On day 5, state is [0.65984 0.34016]\n", "On day 6, state is [0.663936 0.336064]\n", "On day 7, state is [0.6655744 0.3344256]\n", "On day 8, state is [0.66622976 0.33377024]\n", "On day 9, state is [0.6664919 0.3335081]\n" ] } ], "source": [ "# day 0 is rainy\n", "x = np.array([0, 1])\n", "\n", "# P is one-step transition matrix\n", "P = np.array([[0.8, 0.4], [0.2, 0.6]])\n", "\n", "for i in range(10):\n", " print (f'On day {i}, state is {x}')\n", " x = P @ x" ] }, { "cell_type": "markdown", "id": "74a1e97d", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here too, the chain seems to be converging to the same vector $\\begin{bmatrix} 2/3 \\\\ 1/3 \\end{bmatrix}$." ] }, { "cell_type": "markdown", "id": "9f7329ba", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This suggests that $\\begin{bmatrix} 2/3 \\\\ 1/3 \\end{bmatrix}$ is a special vector. " ] }, { "cell_type": "markdown", "id": "8ffbcf0c", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Indeed, we can see that if the chain ever exactly reached this vector, it would never deviate after that, because\n", "\n", "$$ \\begin{bmatrix}0.8 & 0.4\\\\0.2 & 0.6\\end{bmatrix} \\begin{bmatrix} 2/3 \\\\ 1/3 \\end{bmatrix} = \\begin{bmatrix} 2/3 \\\\ 1/3 \\end{bmatrix}$$" ] }, { "cell_type": "markdown", "id": "586fd764", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "A vector $\\mathbf{x}$ for which $P \\mathbf{x} = \\mathbf{x}$ is called a __steady state__ of the Markov chain $P$." ] }, { "cell_type": "markdown", "id": "a33b4cfb", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "__Example.__ If the Markov chain below is in state $\\begin{bmatrix}\\frac{6}{11}\\\\\\frac{3}{11}\\\\\\frac{2}{11}\\end{bmatrix}$, is it in a steady state?\n", "\n", "\n", "
\n", "\n", "\"Figure\"\n", " \n", "
\n", "\n" ] }, { "cell_type": "markdown", "id": "6d365a08", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The transition matrix $P=\\begin{bmatrix}0.8&0.2&0.3\\\\0.1&0.6&0.3\\\\0.1&0.2&0.4\\end{bmatrix}$ and the state $x=\\begin{bmatrix}\\frac{6}{11}\\\\\\frac{3}{11}\\\\\\frac{2}{11}\\end{bmatrix}$\n", "\n", "We just need to check if $Px=x$" ] }, { "cell_type": "markdown", "id": "2acde355", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "\n", "$$\\begin{bmatrix}0.8&0.2&0.3\\\\0.1&0.6&0.3\\\\0.1&0.2&0.4\\end{bmatrix} \\begin{bmatrix}\\frac{6}{11}\\\\\\frac{3}{11}\\\\\\frac{2}{11}\\end{bmatrix}= \\begin{bmatrix}\\frac{6}{11}\\\\\\frac{3}{11}\\\\\\frac{2}{11}\\end{bmatrix}$$. \n", "\n", "So yes, the chain is in a steady state." ] }, { "cell_type": "markdown", "id": "19031b5a", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Convergence to Steady-State" ] }, { "cell_type": "markdown", "id": "460de279", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "So, two remarkable things are happening in the example above:\n", " \n", "1. There is a steady-state vector $\\mathbf{x}$ such that $P\\mathbf{x} = \\mathbf{x}$.\n", "2. It seems that the chain __converges__ to this steady-state, __no matter__ what the starting point is." ] }, { "cell_type": "markdown", "id": "63ad1d24", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "These properties are a very interesting and useful behavior, because it means we can talk about the __long term__ behavior of the Markov Chain without worrying about __what state the chain started in.__\n", "\n", "In particular this will turn out to be hugely useful for problems in Bayesian statistics (which we will return to in the next lecture)." ] }, { "cell_type": "markdown", "id": "ff14c259", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "So we would very much like to know\n", "\n", " \n", "1. When does P have a steady-state?\n", "2. When can we be sure that P converges to that steady state?\n" ] }, { "cell_type": "markdown", "id": "8bc84386", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To answer these questions we'll use a little bit of linear algebra and some ideas from the study of graphs." ] }, { "cell_type": "markdown", "id": "332589c6", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "
~~~Extra Background Material for the Curious~~~
\n", "\n", "First, recall that when a square matrix $A$ has a nonzero vector $\\mathbf{x}$ such that\n", "\n", "$$ A\\mathbf{x} = \\lambda\\mathbf{x} $$\n", "\n", "for some scalar value $\\lambda$, then we say that $\\mathbf{x}$ is an __eigenvector__ of $A$, and $\\lambda$ is its associated __eigenvalue.__" ] }, { "cell_type": "markdown", "id": "b39df8ae", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Now $P$ has two special properties:\n", "1. None of its entries are negative\n", "2. Its columns sum to 1\n" ] }, { "cell_type": "markdown", "id": "09b25c9a", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Because of these two properties, we can prove that the largest eigenvalue $\\lambda$ of $P$ is 1. \n", "\n", "This is not hard to prove (and would be proved in a linear algebra course) but we'll not prove it here. The key to proving it is based on the fact that the columns of $P$ sum to 1.\n", "\n", "For $\\lambda=1$ we see that $ A\\mathbf{x} = \\mathbf{x} $ making $x$ a steady state.\n", "\n", "
~~~End of Extra Background Material~~~
" ] }, { "cell_type": "markdown", "id": "891a9f2b", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This means that __every Markov chain has at least one steady-state.__" ] }, { "cell_type": "markdown", "id": "75d9a6f3", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "OK, so next we'd like to establish whether the chain has a __unique__ steady-state, that is, does it have only one steady-state?" ] }, { "cell_type": "markdown", "id": "6b922903", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let's consider the following Markov chain:\n", "\n", "
\n", "\n", "\"Figure\"\n", " \n", "
" ] }, { "cell_type": "markdown", "id": "51686a6c", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For this chain, the distribution $\\begin{bmatrix}1\\\\0\\\\0\\end{bmatrix}$ is a steady-state, as is the distribution $\\begin{bmatrix}0\\\\0\\\\1\\end{bmatrix}$." ] }, { "cell_type": "markdown", "id": "df6262e5", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In fact, any convex combination of these vectors is a steady state, such as $\\begin{bmatrix}.6\\\\0\\\\.4\\end{bmatrix}$.\n", "\n", "So this Markov chain has an infinite set of steady-state vectors. " ] }, { "cell_type": "markdown", "id": "5e356017", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "When does a Markov chain have just a __single__ steady-state?" ] }, { "cell_type": "markdown", "id": "9969fa05", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The key to the answer comes from looking at the graph above. Once the chain reaches state 0, it can never reach the other states in the future. Likewise for state 2." ] }, { "cell_type": "markdown", "id": "a817d76e", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We say that a state $i$ is __reachable__ from state $j$ if one can get from $j$ to $i$ following directed edges." ] }, { "cell_type": "markdown", "id": "80cd08bc", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Consider the previous example again:\n", " \n", "
\n", "\n", "\"Figure\"\n", " \n", "
\n", "\n", "Here, state 0 is __not reachable__ from any other state.\n", "\n", "Furthermore, state 3 can __not reach__ any other state." ] }, { "cell_type": "markdown", "id": "ad6682db", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Now we can answer the question:\n", " \n", "If in a Markov chain $P$, every state is reachable from every other state, we call the graph __strongly connected.__ In that case, the Markov chain is __irreducible,__ and it has __only one steady-state.__" ] }, { "cell_type": "markdown", "id": "0c97e13f", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Now, does an irreducible Markov chain always converge to its unique steady-state?" ] }, { "cell_type": "markdown", "id": "d1c33367", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Consider this example:\n", " \n", "
\n", "\n", "\"Figure\"\n", " \n", "
\n" ] }, { "cell_type": "markdown", "id": "5c85906a", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This chain is irreducible: each state can reach the other. So it has a unique steady-state, which is $\\begin{bmatrix}0.5\\\\0.5\\end{bmatrix}$." ] }, { "cell_type": "markdown", "id": "c037a816", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "However, it does not converge to that steady state. \n", "\n", "Consider if it starts in state $\\begin{bmatrix}1\\\\0\\end{bmatrix}$. In the next step it will be $\\begin{bmatrix}0\\\\1\\end{bmatrix}$, and then back to $\\begin{bmatrix}1\\\\0\\end{bmatrix}$." ] }, { "cell_type": "markdown", "id": "f22c1dab", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This cycle will continue forever.\n", "\n", "Another way to think about this is that the future state of the Markov chain is never independent of its starting state." ] }, { "cell_type": "markdown", "id": "bf8a04a7", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "So the Markov chain must not be __periodic__ in the way that this example is." ] }, { "cell_type": "markdown", "id": "e6e7b68c", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "There is a way to confirm that a Markov chain is not periodic." ] }, { "cell_type": "markdown", "id": "f736ccfa", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "A Markov chain is not periodic if there is some power of $P$, say $P^k$, such that all entries in $P^k$ are positive." ] }, { "cell_type": "markdown", "id": "00a18d65", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "You can see that the transition matrix for the example above is\n", "\n", "$$ P = \\begin{bmatrix}0&1\\\\1&0\\end{bmatrix} $$" ] }, { "cell_type": "markdown", "id": "88395782", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "and you can confirm that no power of $P$ is all positive." ] }, { "cell_type": "markdown", "id": "4bf20864", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "When a Markov chain $P$ has some power $k$ such that all entries in $P^k$ are positive, then the chain is said to be __primitive.__" ] }, { "cell_type": "markdown", "id": "9bd724c1", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "And now we can complete the story:\n", "\n", "
A primitive (i.e. irreducible aperiodic) Markov chain has a single steady-state to which it always converges.
" ] }, { "cell_type": "markdown", "id": "6eb2cfbe", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "```{margin} \n", "A Markov chain that is irreducible but not primitive has a single eigenvalue of 1, but will have other eigenvalues whose _modulus_ is 1. For example, this chain has an eigenvalue of -1. In other cases, the eigenvalues will be complex with modulus 1. The theory behind these results is called Perron-Frobenius theory, and it is a rich subject. A good reference is Chapter 8 of _Matrix Analysis and Applied Linear Algebra_, by Carl D. Meyer.\n", "```" ] }, { "cell_type": "markdown", "id": "837dd45d", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "__Example.__ In the following chain, for each node, all transitions out of that node are equally probable. So for a node with 2 outgoing transitions, the probabilities are each 1/2. Is this chain a primitive Markov Chain?\n", "\n", "\n", "
\n", "\n", "\"Figure\"\n", " \n", "
\n", "\n" ] }, { "cell_type": "markdown", "id": "00de20a5", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "First, is it a Markov chain?" ] }, { "cell_type": "markdown", "id": "4d69202a", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "__We need to check if the columns add up to 1.__\n", "\n", "$$P = \\begin{bmatrix}0.5&0.5&0\\\\0.5&0&0.5\\\\0&0.5&0.5\\end{bmatrix}$$\n" ] }, { "cell_type": "markdown", "id": "47d1e0a2", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Second, does it have at least one steady state?" ] }, { "cell_type": "markdown", "id": "a02804ee", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "__Yes, all Markov chains have at least one steady state.__" ] }, { "cell_type": "markdown", "id": "979b37e5", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Third, does it have exactly one steady state?" ] }, { "cell_type": "markdown", "id": "75379d28", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "__We need to check if every state is reachable from every other state__" ] }, { "cell_type": "markdown", "id": "3289029e", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Does it converege to that steady state?" ] }, { "cell_type": "markdown", "id": "3058c505", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "__We need to check if our Markov chain $P$ has some power $k$ such that all entries in $P^k$ are positive__" ] }, { "cell_type": "markdown", "id": "391d3bbe", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let's check $k=2$ first:\n", "\n", "$$P^2 = \\begin{bmatrix}0.5&0.5&0\\\\0.5&0&0.5\\\\0&0.5&0.5\\end{bmatrix} \\begin{bmatrix}0.5&0.5&0\\\\0.5&0&0.5\\\\0&0.5&0.5\\end{bmatrix} = \\begin{bmatrix}0.5&0.25&0.25\\\\0.25&0.5&0.25\\\\0.25&0.5&0.25\\end{bmatrix}$$\n" ] }, { "cell_type": "markdown", "id": "33639793", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We already see only positive entries in $P^2$ so the Markov chain convereges.\n", "\n", "Therefore our example is indeed a primitive Markov chain." ] }, { "cell_type": "markdown", "id": "6fbf9d54", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Time Averages and Ensemble Averages\n", "\n", "There are two senses in which we can interpret the steady-state of a Markov chain." ] }, { "cell_type": "markdown", "id": "f701fb0e", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "
~~~Extra Background Material for the Curious~~~
\n", "\n", "Say we have a Markov chain with steady-state vector $\\mathbf{x}$." ] }, { "cell_type": "markdown", "id": "2ff22007", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "First, consider the evolution of states $X_0, X_1, \\dots.$ At some time long in the future, the probability that the chain is in state $i$ is given by component $i$ of $\\mathbf{x}$. We say that the steady-state represents the __time average__ state, because if we observe the Markov chain over a sequence of steps while it is in steady-state, component $i$ of $\\mathbf{x}$ tells us how often we will see it in state $i$." ] }, { "cell_type": "markdown", "id": "56a9e3ee", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In contrast, consider if we observed many copies of $P$ evolving in parallel. At some time long in the future, we stop all the chains and look at the distribution of states they are in. This is called the __ensemble average__ of the Markov chain.\n", "\n", "
~~~End of Extra Background Material~~~
" ] }, { "cell_type": "markdown", "id": "c3eebe14", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "It is a wonderful fact that for a __primitive__ Markov chain (with a finite state space), the __time average__ and the __ensemble average__ are the same. This is going to be very helpful for us." ] }, { "cell_type": "markdown", "id": "16643b48", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The property of having identical time average and ensemble average is called __ergodicity.__\n", "\n", "So we can say that **a primitive Markov chain is ergodic**" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" }, "rise": { "scroll": false, "theme": "serif", "transition": "fade" } }, "nbformat": 4, "nbformat_minor": 5 }